Skip to content

bench: PMR-100 procedural memory retention benchmark#148

Closed
Gradata wants to merge 46 commits into
mainfrom
feat/pmr-100-benchmark
Closed

bench: PMR-100 procedural memory retention benchmark#148
Gradata wants to merge 46 commits into
mainfrom
feat/pmr-100-benchmark

Conversation

@Gradata

@Gradata Gradata commented Apr 30, 2026

Copy link
Copy Markdown
Owner

Why

Council unanimously called this the single most important pre-launch deliverable. Gradata claims to learn how an agent thinks. PMR-100 is the only test that proves it.

What

100 scripted sessions. Each:

  1. Inject correction (draft → final)
  2. Add distractors
  3. Probe with a task that should fire the new rule
  4. Measure recall@1 and recall@3

24 scenarios across 6 CorrectionType classes (BEHAVIORAL, PREFERENCE, FACTUAL, PROCEDURAL, DOMAIN, plus extras).

First baseline

python -m bench.pmr_100 -n 3

  • Rules extracted: 0%
  • Recall@1: 0%
  • Recall@3: 0%

This is the work. Either correction extraction is async (daemon-only and the daemon isn't running in the test), graduation thresholds prevent the rule from being callable on first probe, or matching needs a warm-up. The benchmark turns this into a measurable, defensible number.

Acceptance criteria for shipping

  • ≥70% recall@1 across all 6 correction classes
  • ≥85% recall@3 across all 6 correction classes
  • Median session <2s

Track this number on every PR.

Run

python -m bench.pmr_100              # 100 sessions
python -m bench.pmr_100 --quick      # 10 sessions
python -m bench.pmr_100 -n 3         # custom

Stacked on top of feat/council-phase-b-fixes (PR #145).

Gradata and others added 30 commits April 20, 2026 15:16
Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json`
vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns`
-> `corrections`). Added `_transform_row` per-table mapper with deterministic
uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone
UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each
batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's
opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` /
`GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both
backend and SDK.

Co-Authored-By: Gradata <noreply@gradata.ai>
…provider synth

Phase 1 of the learning-pipeline revamp. Rule graduation now flows through
the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN,
>= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline.
Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED
header, regenerated only at session_close after the pipeline fires. LLM
synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with
claude CLI fallback (Max-plan OAuth) so users without an exportable key
still get synthesis. Meta-rule deterministic fallback now warns loudly
instead of silently discarding. Drops five env-flag gates in favour of
file-based signals.

Co-Authored-By: Gradata <noreply@gradata.ai>
Adds --cloud / --no-cloud flags to the doctor CLI command and the
underlying diagnose() function. Flips the default cloud endpoint to
api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py
(all passing).

Co-Authored-By: Gradata <noreply@gradata.ai>
Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").

Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
  "how come", "you missed/forgot/failed/didnt"

All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.

Co-Authored-By: Gradata <noreply@gradata.ai>
14 new tests pinning the regex expansion from 5a6da45. Covers real
corrections observed this session ("Why r you not asking council",
"Why flag.. we don't skip we do work") plus shorthand cases
(dont / cant / again / you missed / how come). Dual-signal cases
assert both types detect. Full suite: 37 passed, 1 pre-existing skip.

Co-Authored-By: Gradata <noreply@gradata.ai>
Five post-launch metrics with precise definitions (activation, D7
retention, time-to-first-graduation, free->Pro conversion,
correction-rate decay). Numeric triggers: pivot <20% activation +
flat decay at D30; kill <100 installs at D60; scale >1K installs +
>=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8
of the pre-launch gap analysis.

Co-Authored-By: Gradata <noreply@gradata.ai>
The source-provenance docstring referenced "cloud-side LLM synthesis"
which is stale since the graduation-cloud-gate was removed. Synthesis
runs on the user's machine via rule_synthesizer.py's two-provider path
(Anthropic SDK with user's key, or Claude Code Max CLI OAuth).

Co-Authored-By: Gradata <noreply@gradata.ai>
Graduation and meta-rule LLM synthesis run entirely locally as of a
few sessions ago (rule_synthesizer.py uses user's own Anthropic key or
Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly
still claimed "cloud runs better graduation engine" and implied a
cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy
paragraph to match reality: free is functionally complete; Pro is
visualization, history, export, and the future community corpus.

NOTE: this file is listed in .gitignore per the earlier
"untrack private files" cleanup. Force-added at request.

Co-Authored-By: Gradata <noreply@gradata.ai>
Test was checking the pre-transform local key name. _cloud_sync._transform_row
correctly emits brain_id (cloud schema) from tenant_id (local schema); the
assertion was stale.

Co-Authored-By: Gradata <noreply@gradata.ai>
Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.

Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
  members included), storing {category, description, task} in context so
  session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
    REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
    session shares the lesson's category (or description substring).
    CONFIRMED otherwise (rule survived the session).

Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.

Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.

Co-Authored-By: Gradata <noreply@gradata.ai>
Sweeps the remaining docs that still claimed cloud gated any part of
the learning loop. Actual architecture (as of the graduation-local
pivot):

  Local SDK owns: correction capture, graduation, meta-rule clustering
  AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth),
  rule-to-hook promotion, manifest computation.

  Cloud owns: dashboard/visualization, cross-device sync, team brains,
  managed backups, future opt-in corpus donation.

Files touched:
- docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance.
- docs/architecture/cloud-monolith-v2.md — cloud-side workload framing.
- docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow.
- docs/concepts/meta-rules.md — synthesis is local, not cloud-gated.
- docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize.

README.md was already accurate; no changes there.

Co-Authored-By: Gradata <noreply@gradata.ai>
Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
  and escalate OperationalError to warning (missing-table surfaces).

Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
  no longer propagates and kills the whole push batch.

Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
  from the public repo; add both to .gitignore. These contain kill
  triggers, pricing, and PII that belong in the private brain vault only.

Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
  the detail field, restoring the documented status vocabulary
  ({ok, warn, fail, skip, missing, error}).

Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
  input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
  test_lesson_applications.py.

Verification: full suite 3802 pass, 22 skip, 2 xfailed.
Gradata is fully local-first now. Cloud-gate stubs and "requires cloud"
skip markers were legacy artifacts from an earlier architecture where
discovery/synthesis lived server-side. This commit finishes the port:

- meta_rules.discover_meta_rules + merge_into_meta run locally:
  category grouping + greedy semantic-similarity clustering, zombie
  filter on RULE-state lessons below 0.90, decay after 20 sessions,
  count/(count+3) confidence smoothing.
- Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer,
  test_meta_rule_generalization, test_multi_brain_simulation,
  test_pipeline_e2e. These tests now exercise the local impl directly.
- Retire the api_key-kwarg-on-merge_into_meta path (session-close
  rule_synthesizer drives LLM distillation now).
- Update fixtures to realistic prose so they survive the noise filter
  that rejects "cut:/added:" edit-distance summaries.
- Bump test_meta_rules confidence assertion to the smoothed formula.
- Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges
  (deprecated adapter shims, cloud docs, stale module docstrings).

Suite: 3809 passed, 14 skipped, 2 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>
…xtures

discover_meta_rules is implemented now (local-first). The
  if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.

Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).

Suite: 3813 passed, 10 skipped, 2 xfailed.

Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env

xfails (2) are tracked for v0.7 reconciliation in test docstring.

Co-Authored-By: Gradata <noreply@gradata.ai>
Found while clearing remaining skipped/xfailed tests:

Bug: agent_graduation._update_lesson_confidence had
  confidence = max(0.0, confidence - MISFIRE_PENALTY)
but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added
confidence on rejection. Test test_rejection_decreases_confidence was
xfail'd with 'API drift, reconcile in v0.7' — it was a real bug.

Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY).

Other cleanups in the same pass:

- test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern
  was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson
  graduates straight to RULE (stronger than PATTERN). Accept either state.
- test_integration_workflow: delete stale module-level skipif guarding 5
  tests behind ANTHROPIC/OPENAI keys they never actually use. They only
  exercise local brain.correct/convergence/efficiency — no network.
- test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test
  already covered by the 20+ fake-client tests in the same file).
- test_meta_rules: delete test_with_real_data — dev-time exploration
  script with zero asserts, requiring GRADATA_LESSONS_PATH env var.

Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed.

Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl,
which does not exist on Windows. Complementary Windows paths skip on
Linux — running on each platform covers all 4. Cannot be eliminated.

From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ten stale notes

Co-Authored-By: Gradata <noreply@gradata.ai>
…ate refresh

- agent_graduation: add _extract_output() to handle all Claude Code PostToolUse
  payload key variants (tool_response/tool_output/tool_result/output/response)
  so plan-mode agents no longer silently drop output
- session_close: add _load_soul_mandatories() (VOICE rules from soul.md injected
  into brain_prompt.md) and _refresh_loop_state() (regenerates loop-state.md on
  session close with live DB + lesson counts); raise Stop hook timeout to 90 s
- _events: add _redact_payload() (recursive email PII redaction) wired into
  emit() before any write; raw side-log to events.raw.jsonl (best-effort);
  redactor failure aborts write (fail closed)

Co-Authored-By: Gradata <noreply@gradata.ai>
…e watermarks

- _ulid.py: minimal stdlib ULID generator (no external dep); ulid_from_iso()
  preserves timestamp sort order during historical backfill
- device_uuid.py: atomic read-or-create of per-brain dev_<hex> device id;
  race-safe via O_EXCL temp file + os.replace
- 002_add_event_identity: adds event_id/device_id/content_hash/correction_chain_id/
  origin_agent columns + indexes to events table; chunked 10k-row backfill that
  is idempotent and resumes on restart
- 003_add_sync_state: creates sync_state table if missing and adds device_id/
  last_push_event_id/last_pull_cursor/tenant_id watermark columns + composite indexes
- tests: 44 tests covering all migration paths, chunked backfill, idempotency,
  PII redaction (email), loop-state generation, and session_close functions

Co-Authored-By: Gradata <noreply@gradata.ai>
…ts DB

Reads ~/.claude/projects/<project-hash>/*.jsonl count as the session
number — the actual Anthropic session log — rather than MAX(session)
from the Gradata events table. The two diverged (314 vs 367). Falls
back to the events DB if the project dir can't be located.

Co-Authored-By: Gradata <noreply@gradata.ai>
Previous fix only counted the active project dir (314). Global sum
across all project dirs gives 659, matching the actual Anthropic
session log total. Falls back to events DB if projects dir missing.

Co-Authored-By: Gradata <noreply@gradata.ai>
…oop-state.md (367)

Session number was read from loop-state.md (Gradata events DB counter).
Now counts .jsonl files across all ~/.claude/projects/ dirs — the real
Claude Code session total, same logic as status_line.py.

Co-Authored-By: Gradata <noreply@gradata.ai>
Every silent except Exception: pass in the core library layers now emits
a _log.debug() so failures surface under GRADATA_LOG=debug without
breaking the best-effort semantics. Files touched: brain.py (telemetry
guard), context_wrapper.py (apply_brain_rules / context_for fallbacks),
_brain_manifest.py + _context_compile.py (added module loggers),
_context_packet.py (12 data-loading guards), _manifest_metrics.py
(7 DB query guards), _doctor.py (HTTP body read guard + contextlib
import), _mine_transcripts.py (SIM108 ternary), hooks/session_close.py
(4 x SIM105 OSError guards converted to contextlib.suppress).

Co-Authored-By: Gradata <noreply@gradata.ai>
ruff check src/ --fix resolved 8 auto-fixable violations (E, F, I rules).
ruff format src/ reformatted 163 files to enforce consistent style.
Zero errors remain; 13 pre-existing warnings (optional cloud/framework
imports, lazy __all__ patterns) are unchanged.

Co-Authored-By: Gradata <noreply@gradata.ai>
Two tests expected s0/s42 but got s659 because _claude_session_count()
was walking the real ~/.claude/projects/. Add fake_home fixture so the
function returns None and falls back to the events DB as intended.

Co-Authored-By: Gradata <noreply@gradata.ai>
…eshold

New Stop hook writes a structured handoff to brain/sessions/handoff-{ts}.md
when context usage exceeds GRADATA_CTX_THRESHOLD (default 65%). inject_brain_rules
surfaces a <watchdog-alert> block at next session start so the LLM knows to
review the handoff and run /compact or /clear.

Also: bracket_confidence() in session_close for cache-key stability; remove
MAX_RULES render cap from inject_brain_rules (overshoot logic was masking gaps);
13 new tests in test_ctx_watchdog, tests in test_rule_synthesizer updated.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ript store + retroactive sweep

P1: call_provider() dispatch in rule_synthesizer.py routes by model prefix
(claude-* → Anthropic, gpt-*/o1/o3 → OpenAI, gemini-* → Google, http → generic).
session_close._refresh_brain_prompt now uses call_provider instead of inline SDK.

P2: _bracket_confidence() buckets FSRS floats into 3 stable bands (low/mid/high)
so per-tick confidence changes no longer bust the synthesis cache.

P3: New _transcript.py (log_turn, load_turns, cleanup_ttl) and
_transcript_providers.py (ProviderTranscriptSource + GradataTranscriptSource)
form the transcript store layer. _retroactive_sweep() in the waterfall runs
implicit_feedback patterns across all session turns (gated on GRADATA_TRANSCRIPT=1).
OpenAI, LangChain, CrewAI middleware adapters gain session_id + log_turn() calls.
21 new tests in test_transcript.py.

Co-Authored-By: Gradata <noreply@gradata.ai>
…only

The global Path.is_file patch in _run_main() caused inject_brain_rules to
also read a fake pending_handoff.txt and append a <watchdog-alert> block.
Test now extracts content between <brain-rules>...</brain-rules> before
counting lines, making it immune to any outer blocks appended to the result.

Co-Authored-By: Gradata <noreply@gradata.ai>
- pre_compact.py rewritten: when auto-compact fires with a pending handoff,
  replaces the compact summary verbatim with handoff content so no lossy
  LLM summarization occurs. Manual compact falls back to snapshot. Corrects
  field name from "type" → "trigger" (keeps legacy fallback).

- inject_brain_rules._build_watchdog_block() extracted from inline main():
  Phase 1 (pre-/clear): consumes pending_handoff.txt, stages content to
  post_clear_handoff.txt, injects <watchdog-alert> with run-/clear prompt.
  Phase 2 (post-/clear): consumes post_clear_handoff.txt, injects
  <session-handoff> into fresh session. Phase 2 takes priority if both exist.

- implicit_feedback: return None instead of signal name string to reduce
  UserPromptSubmit noise.

- tests/test_pre_compact.py: 9 tests covering both trigger paths.
- tests/test_inject_watchdog_phases.py: 8 tests covering both phases.

Co-Authored-By: Gradata <noreply@gradata.ai>
graph_first_check.py (PreToolUse, Glob|Grep): blocks exploratory code
searches until the session flag is set. Returns a block decision with
the exact ToolSearch call needed to unblock.

graph_session_track.py (PostToolUse, ToolSearch): writes a per-session
flag file when a ToolSearch query contains "code-review-graph", clearing
the block for the rest of the session.

inject_brain_rules.py: appends <code-graph-tools> directive to every
SessionStart injection, with the mandatory ToolSearch query string.

Both hooks registered in ~/.claude/settings.json. Bypass via
GRADATA_GRAPH_CHECK=0. 18 tests, smoke-tested end-to-end.

Co-Authored-By: Gradata <noreply@gradata.ai>
…tignore cleanup

- test_hooks_intelligence.py: implicit_feedback tests now assert result is None
  and verify IMPLICIT_FEEDBACK event via mock_emit (hook emits, doesn't return)
- session_close.py: reorder imports alphabetically (isort)
- .gitignore: add graphify temp files, run.log patterns, and /.archive/ personal
  Claude Code config backups so they never accidentally land in commits

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata and others added 15 commits April 24, 2026 03:29
… migration reference

- Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py: move legacy
  Streamlit dashboard per Phase 4 deprecation plan (gradata.ai web dashboard
  now covers all panels — /rules, /corrections, /self-healing, /observability)
- Gradata/migrations/supabase/: reference copies of cloud migrations 014-016
  applied to prod 2026-04-24 (corrections unique, events unique, brains.last_used_at)
- Gradata/docs/specs/cloud-sync-and-pricing.md: DRAFT v1 sync architecture +
  pricing tier spec

Co-Authored-By: Gradata <noreply@gradata.ai>
Stale file created by a subagent Bash redirect. Grouped with the existing
Windows cmd.exe stdout misparse artifact entries.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
- CHANGELOG.md: add [Unreleased] section covering 18 commits since 2026-04-23
  (cloud sync, hooks hardening, Supabase migrations, Streamlit archival,
  statusline session-count source, implicit_feedback emit-only contract)
- migrations/supabase/014,015: wrap constraint adds in DO blocks that check
  pg_constraint first, making re-runs safe on any DB (prod already had inline
  UNIQUE _key variants from CREATE TABLE; these migrations added redundant
  _unique variants, now documented as no-op on existing systems)
- migrations/supabase/README.md: document prod constraint state (both _key
  and _unique present on corrections + events) and drift-cleanup deferred

Co-Authored-By: Gradata <noreply@gradata.ai>
Critic audit flagged a silent-drop path: when resolve_brain_dir() returns
None (fresh install, CI env, unconfigured brain) the hook detected signals
but skipped emit() with no log — every correction became invisible.

- hooks/implicit_feedback.py: add debug log in the else branch recording
  how many signals were detected and of which types, so operators running
  `GRADATA_LOG_LEVEL=DEBUG` see the breadcrumb.
- tests/test_implicit_feedback.py: add TestMainNoBrainDir covering the
  main() path (previously only _detect_signals was tested) — verifies the
  debug log fires on detected signals, stays quiet on no-signal input, and
  short messages don't crash.

Co-Authored-By: Gradata <noreply@gradata.ai>
Watermark stalls from 23505 unique-violations were invisible unless a
caller grepped logs: _post() logged everything at WARNING. Now HTTP 409
and any "23505" body are logged at ERROR with a body snippet, and the
last error is persisted to brain_dir/cloud_push_error.json so
'gradata doctor' can surface it ('fail' for constraint violations,
'warn' for other non-2xx). Successful pushes clear the file.

_post() signature is now (accepted, error_info|None); call sites and
the three existing tests patching _post are updated. A _coerce_post_result
shim tolerates legacy int returns from any external patches.

Closes T17 from the overnight backlog (critic finding cycle-2 #1).
Addresses three cycle-3 council findings on commit 492c3dd:

1. Non-atomic write (critic #1, high-severity race). `_record_push_error`
   now writes to `<name>.tmp` then `os.replace`s into the target. Concurrent
   readers (doctor + daemon + MCP server) can no longer observe a truncated
   file that would mask a constraint violation as "error file unreadable".

2. PII leak in persisted error (critic #2). PostgREST 23505 bodies echo
   conflicting row values in `details`/`hint` fields, and `gradata doctor`
   prints the file verbatim. New `_scrub_error_body` parses the body as
   JSON and keeps only `code` + the first 120 chars of `message`
   (enough for the constraint name). Non-JSON bodies reduce to a length
   marker. Log messages use the scrubbed form too.

3. Removed the `_coerce_post_result` shim (verifier + critic). Zero tests
   exercised the bare-int branch it guarded; callers now destructure
   `_post` returns directly.

Tests: +2 (`test_post_error_body_scrubs_row_values`,
`test_scrub_error_body_handles_non_json`), 28/28 in the cloud test files
pass, 3944 passed / 3 skipped full suite. Ruff + pyright clean.

Co-Authored-By: Gradata <noreply@gradata.ai>
When doctor reports on cloud_push_error.json, the detail string now names
the brain directory it checked. In multi-brain deployments, push() and
doctor() can resolve different brain_dirs silently — surfacing the path
lets users spot the divergence instead of chasing phantom "ok" reports.

Cycle-3 critic finding #3.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…metry

Three bugs kept last_sync_at frozen:
- cloud/client.py POSTed /brains/sync (path doesn't exist) -> /sync
- cloud/sync.py POSTed /v1/telemetry/metrics -> /api/v1/telemetry/metrics
- Stop hook never fired cloud sync because Claude Code doesn't call
  brain.end_session(). Added cloud_sync_tick() helper in _core.py and
  new _run_cloud_sync step in session_close.py waterfall.

Also elevated silent DEBUG failures to WARNING with HTTP status +
exc_info so the next failure mode surfaces in run.log.

3945 tests pass.

Co-Authored-By: Gradata <noreply@gradata.ai>
New CLI: gradata skill export <name> [--output-dir DIR] [--description STR]
                                      [--category CAT] [--no-meta]

The bet: Claude Skills' "gotchas" section is exactly what graduated
RULE-tier lessons are -- but generated from real corrections instead of
hand-written. This turns a brain into a portable, shippable Skill folder
with valid YAML frontmatter, category-grouped gotchas, and (when
available) injectable meta-principles.

- new module enhancements/skill_export.py reuses _parse_rules from
  rule_export so the RULE-only filter and [hooked] marker stripping
  stay consistent across exporters
- auto-generated frontmatter description lists rule categories with
  defensive 900-char clip (Anthropic 1024 ceiling)
- name slugified for safe folder name + frontmatter alignment
- description quote-escapes preserve YAML validity
- meta-rule loader degrades gracefully on missing system.db / table

24 new tests; full suite 3969 pass (+24, 0 regressions).

Unblocks M4 items 7 and 9 (self-dev Skill, composition Skill) per
plans/swift-toasting-origami.md.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ignore scratch dirs

P0-1: AGENTS.md previously described 'Sprites Work multi-agent
TypeScript/Claude Flow framework' which is unrelated to this Python SDK.
Council unanimously flagged as credibility-killer for first-time evaluators.
Replaced with accurate guidance for AGENTS.md-aware coding agents.

P0-3 (partial): added .tmp/, .archive/, sessions/handoff-*.md, /0,
/BrainDetail to .gitignore so scratch artifacts stop getting committed.
Existing tracked scratch files left in place — destructive removal
deferred for user review.
P0-2 from council review: silent except-pass blocks in core SDK swallowed
errors during the headline correction-write path.

- Added tests/test_no_bare_excepts.py: AST scan that fails CI if any
  single-statement except-pass returns to src/gradata/, except for
  documented optional-dep ImportError probes (allowlist).
- Converted 57 non-allowlisted handlers to logger.warning(..., exc_info=True).
- Added logger = logging.getLogger(__name__) where files lacked one.
- Codex (gpt-5.5) generated the patch under workspace-write sandbox;
  could not self-commit because .git/ lives outside its writable root.
  This commit lifts the patch into the repo.

Verification:
- python3 exec of test_no_bare_excepts.py: passes
- python3 -m compileall src/gradata: no syntax errors

NOT in this commit (planned next): P0-5 atomic rule_graph writes,
P0-4 BRAIN_DIR hard-fail, P0-7 import integrity, P0-6 thread-safety lock.
…rity, thread-safety lock

P0-5 atomic writes (rule_graph.json):
- New src/gradata/_atomic.py: atomic_write_text() (temp file + fsync + os.replace + dir fsync on POSIX)
- RuleGraph.save() now uses atomic_write_text — crash mid-write preserves prior state
- Test: test_rule_graph_atomic.py::test_rule_graph_save_preserves_prior_state_when_replace_fails

P0-4 BRAIN_DIR hard-fail (no more silent data loss):
- exceptions.py: new GradataError base + BrainNotConfiguredError(GradataError)
- implicit_feedback hook: raises BrainNotConfiguredError instead of swallowing
- hooks/_base.py: hook runner no longer suppresses BrainNotConfiguredError
- _doctor.py: reports missing BRAIN_DIR as clear brain_dir failure
- Test: test_brain_dir_required.py (3 cases)

P0-7 package import integrity:
- Test: test_import_integrity.py — subprocess: import gradata; from gradata import Brain, Lesson, LessonState
- Plus Brain.init() smoke test in tmp_path

P0-6 thread-safety lock (Brain documented NOT thread-safe yet ships daemon + mcp_server):
- New src/gradata/_brain_lock.py: acquire_brain_lock(brain_dir) context mgr
  - POSIX: fcntl.flock LOCK_EX|LOCK_NB on BRAIN_DIR/.brain.lock
  - Windows: msvcrt.locking
  - In-process duplicate-lock guard
- BrainLockedError in exceptions.py
- daemon.py: acquires lock at start, releases in cleanup
- mcp_server.py: acquires for run_server() duration
- Brain itself does NOT auto-acquire (would break tests)
- Test: test_brain_lock.py::test_second_mock_daemon_raises_brain_locked

CHANGES.md and REPORT.md committed for posterity.
Codex (gpt-5.5) generated all four patches; could not self-commit because .git/
is outside its sandbox writable root.
The single benchmark council recommended Gradata ship before launch. 100
scripted sessions, 6 correction classes, recall@1/recall@3 metrics with
per-class breakdown.

First baseline run (3 sessions, BEHAVIORAL class): 0% rules extracted, 0%
recall. This is the work. Track on every PR. Ship at >=70% recall@1
across all classes.

Run: python -m bench.pmr_100 [--quick] [-n N]
@greptile-apps

greptile-apps Bot commented Apr 30, 2026

Copy link
Copy Markdown

Too many files changed for review. (264 files found, 100 file limit)

…atch

Fix wrong assumption that apply_brain_rules returns a list of rule
objects. It returns a formatted prompt string. Recall scoring now
checks whether expected keywords appear in the rendered text.

Smoke (10 sessions): still 0% recall — confirms the kernel does not
graduate rules from a single correction. Multiple reinforcements
needed before lessons file populates. This is by design (FSRS
scoring) and is the real launch question: how many reinforcements
until rules become callable?
@coderabbitai

coderabbitai Bot commented Apr 30, 2026

Copy link
Copy Markdown
📝 Walkthrough

PMR-100 Procedural Memory Retention Benchmark

Core Addition:

  • New benchmark python -m bench.pmr_100 [--quick] [-n N] with 100 scripted sessions (or configurable count) testing 6 CorrectionType classes across 24 scenarios
  • Measures rule extraction %, recall@1, recall@3, and median session time via apply_brain_rules() scoring
  • Initial baseline (3 sessions): 0% rules extracted, 0% recall@1, 0% recall@3 (expected; awaiting graduation dynamics)
  • Shipping criteria: ≥70% recall@1 and ≥85% recall@3 across all 6 classes; median session <2s

New Public API:

  • Gradata/bench/pmr_100.py exports: Scenario, SessionResult, BenchResult dataclasses; run_one_session() and run_benchmark() functions; main() CLI entrypoint

Infrastructure & Error Handling:

  • Exported new exception types: GradataError, BrainNotConfiguredError, BrainLockedError
  • Added process-level brain directory locking via acquire_brain_lock() to prevent concurrent daemon conflicts
  • Atomic text writes for rule graphs via atomic_write_text()
  • Replaced ~100+ silent except: pass blocks with debug/warning logging across core modules (fixes observability without breaking APIs)

Cloud & Learning Pipeline:

  • Local meta-rule synthesis now default (replaces cloud-only placeholder); deterministic clustering + principle generation
  • Cloud sync refactored with event-sourced model, incremental watermark tracking, and row transformation layer
  • Supabase migrations (3 new) for UNIQUE constraints and sync-stable identifiers (device_id, event_id, last_used_at)

Documentation:

  • Gradata/AGENTS.md (new): AI agent development guide with repo boundaries, standards, and architecture principles
  • Clarified cloud-monolith boundaries: local owns graduation/synthesis/rule promotion; cloud is visualization-only mirror
  • Legacy cleanup plan for deprecated cloud adapters

No breaking changes to existing public APIs; all changes are additive or internal logging/formatting improvements.

Walkthrough

This PR implements a comprehensive local-first architecture overhaul for the Gradata SDK, featuring event-sourced cloud sync with schema migrations, local meta-rule discovery and synthesis, brain directory locking, PII-redacted event logging, benchmarking infrastructure, and pervasive exception-logging improvements across the codebase.

Changes

Cohort / File(s) Summary
Configuration & Ignore Files
.gitignore, Gradata/.gitignore
Updated to exclude Graphify artifacts, Gradata runtime logs, archived directories, and draft documentation.
Documentation & Guides
Gradata/AGENTS.md, Gradata/CHANGELOG.md, Gradata/CHANGES.md, Gradata/REPORT.md, Gradata/docs/*
Added agent coordination guidelines, comprehensive changelog documenting unreleased changes, concrete technical change tracking, legacy cleanup guidance, and clarified architecture diagrams emphasizing local-first ownership of graduation/synthesis with cloud as downstream mirror.
Skill & Hook Infrastructure
Gradata/skills/core/session-start/SKILL.md, Gradata/hooks/hooks.json
New session-start skill defining control flow (continuation handling, parallel context reads, alert filtering) and updated hook pipeline to include context-window watchdog on Stop event with increased session-close timeout.
Supabase Cloud Migrations
Gradata/migrations/supabase/014_corrections_unique.sql, 015_events_unique.sql, 016_brains_last_used_at.sql, README.md
Added three idempotent migrations (dedup + unique constraints on corrections/events, new last_used_at column) with comprehensive documentation on apply workflow and schema conventions.
Benchmarking System
Gradata/bench/__init__.py, Gradata/bench/pmr_100.py, Gradata/bench/README.md
Introduced PMR-100 procedural memory retention benchmark with configurable distractor injection, keyword-based recall scoring, per-scenario execution, and JSON result persistence.
Core Exception Types & Exports
Gradata/src/gradata/__init__.py
Expanded public API to export BrainLockedError, BrainNotConfiguredError, and GradataError.
Brain Locking & Atomic Writes
Gradata/src/gradata/_brain_lock.py, Gradata/src/gradata/_atomic.py
Introduced exclusive, non-blocking brain directory .brain.lock with in-process deduplication and cross-platform (POSIX/Windows) lock APIs; added atomic text-file write with fsync guarantees.
Event System & PII Redaction
Gradata/src/gradata/_events.py, Gradata/src/gradata/_transcript.py, Gradata/src/gradata/_transcript_providers.py
Implemented recursive email PII redaction into canonical events, optional raw JSONL side-log, lightweight transcript system (log_turn/load_turns/cleanup_ttl), and provider-aware transcript source detection (Claude Code + Gradata).
Local SQLite Migrations
Gradata/src/gradata/_migrations/001_add_tenant_id.py, 002_add_event_identity.py, 003_add_sync_state.py, _runner.py, _ulid.py, device_uuid.py, tenant_uuid.py, fill_null_tenant.py
Added comprehensive migration system: event identity backfill with ULID generation and device/content-hash derivation, sync-state table initialization, tenant UUID generation and management, deterministic device ID persistence.
Cloud Sync & Row Transformation
Gradata/src/gradata/_cloud_sync.py, Gradata/src/gradata/cloud/client.py, Gradata/src/gradata/cloud/sync.py
Enhanced cloud sync with row schema transformation (mapping local SQLite → Supabase schemas), deduplication, batching, constraint-violation classification, watermark-only-on-full-acceptance semantics, and persistent cloud_push_error.json for diagnostics; updated API endpoints to /api/v1 versioning.
Core Brain & Emission Logic
Gradata/src/gradata/_core.py, Gradata/src/gradata/brain.py
Added cloud_sync_tick for hook-safe telemetry sync, updated apply_brain_rules to include task field in rules.injected event, replaced silent exception handlers with warning logs throughout correction/sync/export paths.
Doctor & Diagnostics
Gradata/src/gradata/_doctor.py
Extended diagnose() with cloud-specific checks (config validation, network reachability, token verification, analytics probing) and elevated missing BRAIN_DIR from skip to fail; detects and escalates persistent cloud push errors.
Meta-Rule Discovery & Synthesis
Gradata/src/gradata/enhancements/meta_rules.py, Gradata/src/gradata/enhancements/meta_rules_storage.py
Replaced cloud-only placeholder discover_meta_rules() with full local clustering pipeline (confidence filtering, noise removal, semantic similarity grouping); updated merge_into_meta() to deterministically generate principle text plus injection fields (applies_when, context_weights, examples, scope).
Rule Synthesis & Skills Export
Gradata/src/gradata/enhancements/rule_synthesizer.py, Gradata/src/gradata/cli.py
Added synthesize_rules_block() combining ranked rules into unified <brain-wisdom> text via Anthropic/OpenAI/Gemini with provider routing and cache; wired skill export CLI subcommand for SKILL.md generation.
Graduation & Rule Pipeline
Gradata/src/gradata/enhancements/graduation/agent_graduation.py, Gradata/src/gradata/enhancements/rule_pipeline.py
Updated agent_graduation to log profile/compilation errors and reversed misfire penalty sign; refactored run_rule_pipeline to use canonical _graduation.graduate() flow with state-change tracking for Phase 2 promotions.
Manifest & Context Helpers
Gradata/src/gradata/_brain_manifest.py, Gradata/src/gradata/_context_compile.py, Gradata/src/gradata/_context_packet.py, Gradata/src/gradata/_manifest_helpers.py, Gradata/src/gradata/_manifest_metrics.py, Gradata/src/gradata/_manifest_quality.py
Added logging for non-fatal failures in session-count cross-check, keyword fallback, FTS/fact/correction loading, metrics queries, and datetime parsing; replaced all silent pass blocks with warning logs.
Daemon & Process Locking
Gradata/src/gradata/daemon.py
Wrapped daemon startup with acquire_brain_lock context to ensure exclusive process execution; added warning logs for telemetry failures.
Database & Query Utilities
Gradata/src/gradata/_db.py, Gradata/src/gradata/_query.py
Added logging for lessons-lock cleanup failures; reformatted query function signatures and FTS logic to multi-line expressions.
Export, Audit & Detection
Gradata/src/gradata/_export_brain.py, Gradata/src/gradata/audit.py, Gradata/src/gradata/detection/addition_pattern.py
Added logging to build_prospect_map() failures; replaced silent exceptions in fact extraction and pattern detection with warning logs.
Validator & Workers
Gradata/src/gradata/_validator.py, Gradata/src/gradata/_workers.py
Added logging for training-depth parse and event-count failures in validator; reformatted worker threading and job-type declarations.
Adapter & Cloud Libraries
Gradata/src/gradata/adapters/mem0.py, Gradata/src/gradata/cloud/...*
Minor formatting/error-logging updates for memory adapter and cloud client error classification.
Enhancement Modules (Meta, Rules, Patterns, Scoring, etc.)
Gradata/src/gradata/enhancements/*
Widespread updates across 50+ enhancement files: (1) replaced ~40 silent except: pass blocks with logger.warning(..., exc_info=True); (2) reformatted regex/constant/field/expression definitions from single-line to multi-line; (3) updated imports (e.g., Callable/Iterable from collections.abc); (4) no semantic logic changes beyond exception handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #93: Overlapping codebase-wide enhancements, Apache relicensing, merged algorithm modules, and multi-file API additions across the same directories.
  • PR #142: Directly related cloud sync surface (push/pull), event PII redaction, and migrations (002/003/_ulid/device_uuid) changes.
  • PR #102: Both implement multi-tenant/cloud sync work—adding tenant_id handling, migration runners, and push-style _cloud_sync integration with schema/migration wiring.

Suggested labels

feature, enhancement, architecture, cloud, local-first

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/pmr-100-benchmark
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feat/pmr-100-benchmark

@coderabbitai coderabbitai Bot added the feature label Apr 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 47

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (20)
Gradata/src/gradata/enhancements/dedup.py (1)

297-299: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make dedup failures observable in exception path

Line 297 catches all exceptions, but Line 298 logs only at debug and drops traceback context. This can mask persistent ingestion/dedup failures. Please either narrow exception types or log at warning with exc_info=True.

Proposed fix
-    except Exception as e:
-        _log.debug("Observation dedup failed: %s", e)
+    except (sqlite3.Error, OSError, ValueError):
+        _log.warning("Observation dedup failed", exc_info=True)
         return False

As per coding guidelines, "Never use bare except: pass; use typed exceptions or at minimum logger.warning(...) with exc_info=True."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/dedup.py` around lines 297 - 299, The except
Exception as e block that currently calls _log.debug("Observation dedup failed:
%s", e) should be changed so failures are observable: either catch specific
exception types raised by the dedup code (e.g., ValueError, KeyError, or any
domain-specific exceptions) instead of broad Exception, or at minimum change the
logging call to _log.warning("Observation dedup failed", exc_info=True) so the
traceback is recorded; keep the existing return False behavior. Ensure you
update the except clause and the _log call in the block that contains the caught
Exception and the current debug log.
Gradata/src/gradata/_query.py (1)

51-52: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow the migration exception handling to duplicate-column only.

Line 51 currently suppresses all sqlite3.OperationalError, which can silently mask real DB failures (e.g., locked DB). The inline comment says only duplicate-column should be ignored, so the implementation should match that.

Suggested fix
-    with contextlib.suppress(sqlite3.OperationalError):
-        conn.execute("ALTER TABLE brain_fts_content ADD COLUMN tenant_id TEXT")
+    try:
+        conn.execute("ALTER TABLE brain_fts_content ADD COLUMN tenant_id TEXT")
+    except sqlite3.OperationalError as exc:
+        if "duplicate column name" not in str(exc).lower():
+            raise

As per coding guidelines: "Never use bare except: pass; use typed exceptions or at minimum logger.warning(...) with exc_info=True."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_query.py` around lines 51 - 52, The code currently
suppresses all sqlite3.OperationalError around the ALTER TABLE call; replace the
contextlib.suppress block with an explicit try/except that catches
sqlite3.OperationalError as e, inspects str(e).lower() for a duplicate-column
indicator (e.g., "duplicate column", "duplicate column name", or the SQLite
error code text) and only swallow the exception in that specific case; for any
other OperationalError re-raise it (or call
processLogger.warning/processLogger.error with exc_info=True before raising) so
real DB failures (like locks) are not silently ignored. Keep the ALTER statement
and the conn.execute("ALTER TABLE brain_fts_content ADD COLUMN tenant_id TEXT")
location intact while changing suppression logic around it.
Gradata/src/gradata/enhancements/router_warmstart.py (1)

123-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add exc_info=True to capture the traceback.

The exception handler logs a warning but doesn't capture the stack trace, making warm-start failures difficult to debug. As per coding guidelines, exception logging should include exc_info=True to preserve traceback information.

🔍 Proposed fix to preserve traceback
-        _log.warning("Warm-start failed: %s", e)
+        _log.warning("Warm-start failed: %s", e, exc_info=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/router_warmstart.py` at line 123, The
warning log in router_warmstart.py currently calls _log.warning("Warm-start
failed: %s", e) without the traceback; update this exception handler to pass
exc_info=True to _log.warning so the stack trace is preserved (i.e., change the
_log.warning call in the warm-start exception handling block to include
exc_info=True while keeping the existing message and exception argument).
Gradata/src/gradata/enhancements/reporting.py (2)

349-355: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Health report can remain falsely HEALTHY when lesson counting fails.

If _count_active_lessons(...) raises, the code logs a warning but does not set report.healthy = False or record an issue, so output can misreport health.

Proposed fix
     try:
         brain_dir = Path(db_path).parent if db_path else (Path(ctx.brain_dir) if ctx else None)
         if brain_dir:
             report.lessons_active = _count_active_lessons(brain_dir)
     except Exception:
-        logger.warning('Suppressed exception in generate_health_report', exc_info=True)
+        logger.warning("Suppressed exception in generate_health_report", exc_info=True)
+        report.issues.append("Could not read lessons")
+        report.healthy = False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/reporting.py` around lines 349 - 355, The
try/except around calling _count_active_lessons in generate_health_report
swallows exceptions and leaves report.healthy untouched, allowing a false
"HEALTHY" status; change the except block to mark the report as unhealthy and
record the failure: set report.healthy = False and add a descriptive issue to
the report (e.g., using report.issues.append(...) or report.add_issue(...)
depending on the report API) including the exception message and context, and
continue to log the warning with exc_info=True via logger so the error is both
recorded in the report and logged.

192-197: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

include_instincts=True currently does not include low-confidence INSTINCT rules.

At Line 197, lesson.confidence >= min_confidence still gates INSTINCT entries, so low-confidence instincts remain excluded even when include_instincts is enabled (contradicting the function contract/docstring).

Proposed fix
-            for lesson in lessons:
-                state_upper = lesson.state.value.upper()
-                if state_upper in allowed_states and lesson.confidence >= min_confidence:
+            for lesson in lessons:
+                state_upper = lesson.state.value.upper()
+                meets_confidence = (
+                    state_upper == "INSTINCT" and include_instincts
+                ) or lesson.confidence >= min_confidence
+                if state_upper in allowed_states and meets_confidence:
                     briefing.rules.append(
                         BriefingRule(
                             category=lesson.category,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/reporting.py` around lines 192 - 197, The
loop over lessons currently filters INSTINCT entries by lesson.confidence >=
min_confidence, so enabling include_instincts doesn't include low-confidence
INSTINCTs; update the conditional in the for-loop that uses state_upper,
allowed_states and lesson.confidence so that if state_upper == "INSTINCT" and
include_instincts is True the lesson is accepted regardless of
lesson.confidence, otherwise keep the existing check (e.g. allow when
state_upper in allowed_states and lesson.confidence >= min_confidence); modify
the condition around the for-loop that references lesson.state.value,
state_upper, and lesson.confidence accordingly.
Gradata/src/gradata/_export_brain.py (3)

262-273: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fallback manifest should include the same export.mode field.

The success path sets manifest["export"]["mode"], but the fallback manifest omits it. Keep the schema consistent across both paths to avoid downstream branching surprises.

Suggested fix
-            "export": {"exported_at": now.isoformat(), "files": [path for path, _ in sanitized]},
+            "export": {
+                "exported_at": now.isoformat(),
+                "mode": "domain-only"
+                if domain_only
+                else ("no-prospects" if not include_prospects else "full"),
+                "files": [path for path, _ in sanitized],
+            },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_export_brain.py` around lines 262 - 273, The fallback
manifest being built in the failure path (the manifest dict initialized near
variables version, domain, sessions, graduated, active, sanitized, now) is
missing the same export.mode field present in the success path; update the
manifest["export"] entry to include "mode" (matching the success path semantics)
alongside "exported_at" and "files" so both paths produce the same schema and
avoid downstream branching surprises.

214-238: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

ctx is only partially honored, causing mixed-context exports.

From Line 217 through Line 237, metadata and file lists are still resolved via global _p paths (read_version, read_domain_name, collect_*, _LESSONS_*, _CARL_*, _QUALITY_RUBRICS) instead of the provided ctx. With non-default contexts, this can export the wrong brain/domain data.

Based on learnings, “Round 2 addressed ... tenant-scoped fallback ...” and this path-mixing breaks that same tenant/context-scoping principle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_export_brain.py` around lines 214 - 238, The code is
using global path helpers and readers (read_version, read_domain_name,
read_session_count, count_lessons, collect_domain_files, collect_brain_files,
_LESSONS_ARCHIVE, _LESSONS_ACTIVE, _CARL_LOOP, _CARL_GLOBAL, _QUALITY_RUBRICS,
build_prospect_map) instead of the provided ctx, causing mixed-context exports;
update this block so every metadata read and file collection uses the ctx-scoped
equivalents (e.g., call context-aware readers or pass ctx into
read_version/read_domain_name/read_session_count/count_lessons/collect_* or
resolve lesson/carl/quality paths from ctx) and make build_prospect_map use
ctx.prospects_dir (or fall back to _p only when ctx is None) so all resolved
files and counts are consistently tenant-scoped.

241-244: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not silently skip unreadable source files in export loop.

Line 243 suppresses all read failures and continues, which can produce incomplete archives without any signal. Log and narrow exception types here.

Suggested fix
-        except Exception:
-            continue
+        except (OSError, UnicodeDecodeError):
+            logger.warning(
+                "Skipping unreadable export source: %s",
+                source_path,
+                exc_info=True,
+            )
+            continue

As per coding guidelines, “Never use bare except: pass; use typed exceptions or at minimum logger.warning(...) with exc_info=True.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_export_brain.py` around lines 241 - 244, The loop
currently swallows all errors when calling
source_path.read_text(encoding="utf-8"); replace the bare except with a typed
exception handler (e.g. except (OSError, UnicodeDecodeError) as e:) and log a
warning that includes the file path and the exception (use
logger.warning(f"Unable to read {source_path}: {e}", exc_info=True)) before
continuing; if your module uses a different logger name, use that (e.g.
self.logger) so unreadable files are reported instead of silently skipped.
Gradata/src/gradata/enhancements/clustering.py (1)

95-100: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard non-dict scope_json payloads before calling .get().

At Line 97, json.loads may return a list/string/number; calling .get() then raises AttributeError and breaks clustering instead of falling back to "global".

Proposed fix
         if lesson.scope_json:
             try:
                 scope = json.loads(lesson.scope_json)
-                return scope.get("domain", "") or "global"
+                if isinstance(scope, dict):
+                    return scope.get("domain", "") or "global"
+                logger.warning(
+                    "Invalid scope_json type in cluster_rules._extract_domain: expected object, got %s",
+                    type(scope).__name__,
+                )
+                return "global"
             except (json.JSONDecodeError, TypeError):
                 logger.warning('Suppressed exception in cluster_rules._extract_domain', exc_info=True)
         return "global"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/clustering.py` around lines 95 - 100, The
current _extract_domain logic calls scope.get(...) after json.loads which can
return non-dict types; modify the try block in clustering.py (inside
cluster_rules._extract_domain) to validate that scope is a dict before calling
.get — if isinstance(scope, dict) use scope.get("domain", "") or "global",
otherwise return "global"; keep the existing except (json.JSONDecodeError,
TypeError) handler and logger.warning as-is to preserve error logging.
Gradata/src/gradata/_context_packet.py (1)

231-240: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

_load_audit_context should close SQLite connections in finally.

If any statement raises before conn.close(), the connection remains open.

Suggested fix
 def _load_audit_context(session: int, ctx: "BrainContext | None" = None) -> dict:
     result = {"metrics": {}, "outputs": [], "gates": [], "correction_rate": {}}
+    conn = None
     try:
         db = ctx.db_path if ctx else _p.DB_PATH
         conn = sqlite3.connect(str(db))
         conn.row_factory = sqlite3.Row
         row = conn.execute("SELECT * FROM session_metrics WHERE session = ?", (session,)).fetchone()
         if row:
             result["metrics"] = dict(row)
-        conn.close()
     except Exception as e:
         _log.debug("audit metrics query failed (non-fatal): %s", e)
+    finally:
+        if conn is not None:
+            with contextlib.suppress(Exception):
+                conn.close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_context_packet.py` around lines 231 - 240, The
try/except in _load_audit_context opens a SQLite connection with sqlite3.connect
and calls conn.close() only on the normal path, leaving conn open if an
exception occurs; move connection cleanup into a finally block (or use a context
manager) so that conn.close() is always called: ensure the sqlite3.connect(...)
result (conn) is closed in a finally or use "with sqlite3.connect(...)" around
the body that sets conn.row_factory, executes the SELECT on session_metrics,
sets result["metrics"], and then closes the connection regardless of errors,
keeping the existing except _log.debug(...) behavior for non-fatal errors.
Gradata/src/gradata/_manifest_helpers.py (1)

56-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure DB connections are always closed on error paths.

conn.close() currently executes only on success. If execute()/fetchall() raises, the connection leaks. Wrap connection lifetime in finally (or contextlib.closing) in both helpers.

Suggested fix
 def _count_events(ctx: "BrainContext | None" = None) -> dict:
     result = {"total": 0, "by_type": {}}
+    conn = None
     try:
         db = ctx.db_path if ctx else _p.DB_PATH
         conn = get_connection(db)
         rows = conn.execute("SELECT type, COUNT(*) as cnt FROM events GROUP BY type").fetchall()
         for row in rows:
             result["by_type"][row["type"]] = row["cnt"]
             result["total"] += row["cnt"]
-        conn.close()
     except Exception:
         logger.warning('Suppressed exception in _count_events', exc_info=True)
+    finally:
+        if conn is not None:
+            conn.close()
     return result
 
 def _get_tables(ctx: "BrainContext | None" = None) -> list[str]:
+    conn = None
     try:
         db = ctx.db_path if ctx else _p.DB_PATH
         conn = get_connection(db)
         rows = conn.execute(
             "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"
         ).fetchall()
-        conn.close()
         return [r[0] for r in rows]
     except Exception:
         return []
+    finally:
+        if conn is not None:
+            conn.close()

Also applies to: 72-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_manifest_helpers.py` around lines 56 - 68, The DB
connection in _count_events (and the other similar helper at the nearby block)
can leak if execute()/fetchall() raises because conn.close() is only called on
the success path; wrap the connection lifetime so it always closes (e.g., use
contextlib.closing(get_connection(db)) as conn: or assign conn =
get_connection(db) and call conn.close() in a finally: block) and move the SQL
execution inside that safe scope; ensure you update both _count_events and the
analogous helper (the function that queries results in the nearby 72-81 block)
to always close the connection even on errors.
Gradata/src/gradata/contrib/patterns/human_loop.py (1)

494-507: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

context is ignored in HumanLoopGate.check

Line 501 calls gate(action) directly, so context never affects risk override/entity extraction in this path. That can change gating outcomes unexpectedly.

Proposed fix
-def gate(
+def gate(
     action: str,
     risk: RiskAssessment | None = None,
     auto_approve_low: bool = True,
+    context: dict[str, Any] | None = None,
 ) -> ApprovalRequest | None:
@@
     return ApprovalRequest(
         action=action,
         risk=risk,
-        preview=preview_action(action, context=None),
+        preview=preview_action(action, context=context),
     )
@@
     def check(
         self,
         action: str,
         context: dict | None = None,
         approver: Callable | None = None,
     ) -> ApprovalResult:
         """Full gate check: assess risk, request approval if needed."""
-        request = gate(action)
+        risk = assess_risk(action, context)
+        request = gate(action, risk=risk, context=context)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/contrib/patterns/human_loop.py` around lines 494 - 507,
HumanLoopGate.check currently ignores the provided context by calling
gate(action) which prevents context-driven risk overrides/entity extraction;
update the call to invoke gate with the context (e.g., gate(action, context)) so
the gate function receives and can use the context when producing the request,
and ensure the rest of the method (returning approver(request) or
ApprovalResult) continues to operate on that request.
Gradata/src/gradata/contrib/patterns/rag.py (1)

391-406: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Two-pass retrieval is unreachable due to the early hybrid return

On Line 391, any non-empty all_results returns at Line 402, so the cfg.two_pass branch at Line 405 never executes. This makes two_pass=True ineffective.

Proposed fix
-    if all_results:
+    hybrid_result: RetrievalResult | None = None
+    if all_results:
         merged = rrf_merge(*all_results, k=cfg.hybrid_rrf_k)
         graduated = apply_graduation_scoring(merged, cfg)
-        result = RetrievalResult(
+        hybrid_result = RetrievalResult(
             chunks=graduated[:limit],
             query=query,
             mode="hybrid",
             total_candidates=total,
         )
         if _cascade_errors:
-            result.mode = f"hybrid (warnings: {', '.join(_cascade_errors)})"
-        return result
+            hybrid_result.mode = f"hybrid (warnings: {', '.join(_cascade_errors)})"
@@
     if cfg.two_pass and all_results:
         ...
             if pass2_results:
                 return RetrievalResult(...)
+
+    if hybrid_result is not None:
+        return hybrid_result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/contrib/patterns/rag.py` around lines 391 - 406, The
hybrid-return happens whenever all_results is non-empty, which prevents the
subsequent cfg.two_pass branch from ever executing; move or restructure the
two-pass logic so cfg.two_pass is evaluated before returning the hybrid
RetrievalResult. Concretely, before calling rrf_merge/apply_graduation_scoring
and building the RetrievalResult (functions rrf_merge, apply_graduation_scoring,
class RetrievalResult and variables all_results, _cascade_errors, limit, total,
query), check cfg.two_pass and perform the two-pass expansion (the flat_chunks
handling) and update all_results accordingly; only after two_pass processing (or
skipping it) perform the merge, graduation scoring, set result.mode (including
_cascade_errors) and return the final RetrievalResult.
Gradata/src/gradata/_manifest_metrics.py (1)

58-66: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure DB connections are closed on exception paths.

Several functions close conn only on the success path. If an exception occurs after open (for example after Line 60 or Line 136), the connection can leak.

💡 Suggested pattern (apply to each DB block)
+from contextlib import closing
+
 def _correction_rate_trend(ctx: "BrainContext | None" = None, window: int = 10) -> dict | None:
     """Compare current CRO window to baseline window."""
     try:
         db = ctx.db_path if ctx else _p.DB_PATH
-        conn = get_connection(db)
-        max_session, _ = _session_window(conn, window)
+        with closing(get_connection(db)) as conn:
+            max_session, _ = _session_window(conn, window)
 
-        if max_session < window * 2:
-            conn.close()
-            return None
+            if max_session < window * 2:
+                return None
 
-        def _cro(min_s, max_s):
-            outputs = (
-                conn.execute(
+            def _cro(min_s, max_s):
+                outputs = (
+                    conn.execute(
                     "SELECT COUNT(*) FROM events WHERE type='OUTPUT' AND session BETWEEN ? AND ?",
                     (min_s, max_s),
-                ).fetchone()[0]
-                or 0
-            )
-            corrections = (
-                conn.execute(
+                    ).fetchone()[0]
+                    or 0
+                )
+                corrections = (
+                    conn.execute(
                     "SELECT COUNT(*) FROM events WHERE type='CORRECTION' AND session BETWEEN ? AND ?",
                     (min_s, max_s),
-                ).fetchone()[0]
-                or 0
-            )
-            return round(corrections / outputs, 4) if outputs > 0 else None
+                    ).fetchone()[0]
+                    or 0
+                )
+                return round(corrections / outputs, 4) if outputs > 0 else None
 
-        current = _cro(max_session - window + 1, max_session)
-        baseline = _cro(max_session - window * 2 + 1, max_session - window)
-        conn.close()
+            current = _cro(max_session - window + 1, max_session)
+            baseline = _cro(max_session - window * 2 + 1, max_session - window)
 
         if current is None or baseline is None:
             return None

Also applies to: 84-87, 102-104, 134-137, 199-221, 307-312, 339-342, 376-379, 390-393, 493-501

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_manifest_metrics.py` around lines 58 - 66, The DB
connection created by get_connection(...) (e.g., in the block that calls
_session_window and assigns conn) can leak on exception paths; ensure every
place that opens a connection (references to get_connection, the blocks around
_session_window, and the other listed DB blocks) always closes the connection by
using a try/finally (or a context manager wrapper) so conn.close() runs on both
success and exception paths; update each block (including the sections that
reference max_session, the subsequent query blocks and the ranges noted) to wrap
the work in try/finally (or use a connection context) and call conn.close() in
finally.
Gradata/src/gradata/_migrations/__init__.py (1)

109-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Log suppressed base-table migration errors too.

Line 109-110 still swallows sqlite3.OperationalError silently for _BASE_TABLES; that can hide startup schema failures and make migration debugging hard.

Proposed fix
-    for sql in _BASE_TABLES:
-        with contextlib.suppress(sqlite3.OperationalError):
-            conn.execute(sql)
+    for sql in _BASE_TABLES:
+        try:
+            conn.execute(sql)
+        except sqlite3.OperationalError:
+            logger.warning("Suppressed exception in _apply_inline base table setup", exc_info=True)
As per coding guidelines, "Never use bare `except: pass`; use typed exceptions or at minimum `logger.warning(...)` with `exc_info=True`."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_migrations/__init__.py` around lines 109 - 110, The
current use of contextlib.suppress(sqlite3.OperationalError) around
conn.execute(sql) for _BASE_TABLES silently swallows schema errors; change it to
an explicit try/except: wrap conn.execute(sql) in try and catch
sqlite3.OperationalError as e, and call the module logger.warning (or
appropriate logger) including a clear message, the SQL statement context (sql or
_BASE_TABLES) and exc_info=True so the traceback is recorded; keep suppressing
only after logging so startup/schema failures are visible for debugging. Ensure
you reference the same symbols: conn.execute(sql), sqlite3.OperationalError, and
_BASE_TABLES.
Gradata/REPORT.md (1)

1-81: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Please drop this operational handoff report from the PR.

REPORT.md reads like a sandbox/session artifact rather than durable product or developer documentation. It adds environment-specific noise and is likely to go stale immediately.

Based on learnings: Never commit scratch files (.tmp/, .archive/, sessions/handoff-*.md, files named 0 or BrainDetail); these belong in .gitignore.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/REPORT.md` around lines 1 - 81, REPORT.md is a sandbox operational
handoff that should not be committed; remove REPORT.md from the PR/diff and any
staged changes, and instead add appropriate ignore rules (e.g., patterns for
scratch/session files like REPORT.md, sessions/handoff-*.md, .tmp/, .archive/,
files named "0" or "BrainDetail") to the repository .gitignore so future
ephemeral artifacts are excluded; ensure the commit only contains the intended
code changes (e.g., your AST test tests/test_no_bare_excepts.py and the
src/gradata edits) and not this report, then re-run the commit/PR without
REPORT.md included.
Gradata/src/gradata/_core.py (1)

1337-1343: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Env-only cloud auth now logs a warning on every session close.

This re-parses ~/.gradata/config.toml for sync_mode even when the user authenticated purely through GRADATA_API_KEY. In that case read_text() raises and we emit a warning+traceback on every session, even though the setup is valid. Guard with config_path.is_file() or drop this to DEBUG.

Suggested fix
         sync_mode = "metrics_only"
-        try:
-            cfg = _parse_toml_cloud(config_path)
-            sync_mode = cfg.get("sync_mode", "metrics_only")
-        except Exception:
-            logger.warning('Suppressed exception in _cloud_sync_session', exc_info=True)
+        if config_path.is_file():
+            try:
+                cfg = _parse_toml_cloud(config_path)
+                sync_mode = cfg.get("sync_mode", "metrics_only")
+            except Exception as e:
+                _log.debug("cloud sync_mode parse failed: %s", e)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_core.py` around lines 1337 - 1343, The code in
_cloud_sync_session re-parses the config file via _parse_toml_cloud even when
authentication is from env (GRADATA_API_KEY), causing spurious warnings; modify
the block to first check config_path.is_file() before calling _parse_toml_cloud
(or alternatively change logger.warning to logger.debug), e.g. guard the
try/except with if config_path.is_file(): then call cfg =
_parse_toml_cloud(config_path) and set sync_mode = cfg.get("sync_mode",
"metrics_only"), leaving the exception handling to log only when the file
exists.
Gradata/src/gradata/enhancements/meta_rules.py (3)

632-668: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

refresh_meta_rules() still bypasses the new local discovery path.

This function still logs "requires Gradata Cloud" and returns only validated existing entries. Any caller that uses the public refresh API will never see the local-first discover_meta_rules() output added in this PR.

Suggested fix
 def refresh_meta_rules(
     lessons: list[Lesson],
     existing_metas: list[MetaRule],
@@
 ) -> list[MetaRule]:
@@
-    _log.info("Meta-rule discovery requires Gradata Cloud")
     corrections = recent_corrections or []
 
-    # Validate existing meta-rules (invalidation still works locally)
-    valid: list[MetaRule] = []
+    valid: dict[str, MetaRule] = {}
     for meta in existing_metas:
         if validate_meta_rule(meta, corrections):
             meta.last_validated_session = current_session
-            valid.append(meta)
+            valid[meta.id] = meta
 
-    valid.sort(key=lambda m: m.confidence, reverse=True)
-    return valid
+    for meta in discover_meta_rules(
+        lessons,
+        min_group_size=min_group_size,
+        current_session=current_session,
+        **kwargs,
+    ):
+        valid.setdefault(meta.id, meta)
+
+    return sorted(valid.values(), key=lambda m: m.confidence, reverse=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 632 - 668,
refresh_meta_rules currently always skips local discovery and only validates
existing_metas; call the local discover_meta_rules(lessons,
min_group_size=min_group_size, recent_corrections=recent_corrections or [],
**kwargs) inside refresh_meta_rules to obtain new_meta_candidates, validate
those with validate_meta_rule (same as existing flow), set
last_validated_session=current_session on validated discovered metas, merge them
with validated existing_metas (deduplicate by unique id or rule signature), sort
by confidence and return the combined list; keep the log message but update it
to note that local discovery runs and cloud is required only for full discovery
if needed.

532-565: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filter non-injectable meta-rules before formatting.

format_meta_rules_for_prompt() is the prompt-injection path, but it currently formats deterministic meta-rules too. That breaks the INJECTABLE_META_SOURCES invariant and can reintroduce the known-bad principles whenever callers pass raw discovery results.

Suggested fix
     if not metas:
         return ""
 
     # Filter by preconditions/anti-conditions
     if condition_context is not None:
         metas = [m for m in metas if evaluate_conditions(m, condition_context)]
 
+    metas = [m for m in metas if m.source in INJECTABLE_META_SOURCES]
     if not metas:
         return ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 532 - 565,
format_meta_rules_for_prompt currently includes deterministic/non-injectable
meta-rules; before formatting you must filter metas to only those whose source
is allowed by INJECTABLE_META_SOURCES. After the existing filtering/ranking
steps (e.g., after the context/limit block and before constructing lines = ["##
Brain Meta-Rules..."]), add a filter like: metas = [m for m in metas if m.source
in INJECTABLE_META_SOURCES] (or the correct attribute that identifies the meta's
origin), so only injectable meta-rules are rendered.

763-785: ⚠️ Potential issue | 🟠 Major

GRADATA_GEMMA_API_KEY is incompatible with llm_synthesize_rules() due to auth scheme mismatch.

_resolve_llm_credentials() returns Gemma credentials as valid input, but _call_llm_for_synthesis() always POST to /chat/completions with OpenAI-style bearer auth. Gemma's native API requires x-goog-api-key header and a different endpoint. With only GRADATA_GEMMA_API_KEY set, the credential check passes at line 967–970, but the synthesis call fails at the request.

Meanwhile, _try_llm_principle() correctly handles both providers via separate branches: OpenAI via synthesise_principle_llm() (lines 848–860) and Gemma via _call_gemma_native() (lines 862–865).

Either route Gemma credentials through _call_gemma_native() in _call_llm_for_synthesis(), or restrict _resolve_llm_credentials() to return only OpenAI-compatible credentials for the synthesis path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 763 - 785,
_resolve_llm_credentials currently returns Gemma credentials which are
incompatible with the OpenAI-style POST and Bearer auth used by
_call_llm_for_synthesis; update the synthesis path so Gemma creds are handled by
the Gemma-specific branch instead of being passed to _call_llm_for_synthesis.
Concretely: in the logic that invokes _call_llm_for_synthesis (the synthesis
flow referenced by _try_llm_principle and the credential check at lines
~967–970), detect when GRADATA_GEMMA_API_KEY is present and route the call to
_call_gemma_native (or synthesise_principle_llm for OpenAI) rather than calling
_call_llm_for_synthesis with gemma credentials; alternatively, change
_resolve_llm_credentials to only return OpenAI-compatible key/base/model for the
synthesis path and keep Gemma usage confined to _call_gemma_native. Ensure you
modify the calling site(s) that currently assume any non-empty credentials are
OpenAI-compatible.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e5f3a02-3cce-40e0-be5b-f5b3337c7df1

📥 Commits

Reviewing files that changed from the base of the PR and between 5635a66 and dd0ce8b.

⛔ Files ignored due to path filters (1)
  • .claude/hooks/statusline/sprites-statusline.js is excluded by !.claude/**
📒 Files selected for processing (263)
  • .gitignore
  • Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py
  • Gradata/.gitignore
  • Gradata/AGENTS.md
  • Gradata/CHANGELOG.md
  • Gradata/CHANGES.md
  • Gradata/REPORT.md
  • Gradata/bench/README.md
  • Gradata/bench/__init__.py
  • Gradata/bench/pmr_100.py
  • Gradata/docs/LEGACY_CLEANUP.md
  • Gradata/docs/architecture/cloud-monolith-v2.md
  • Gradata/docs/architecture/multi-tenant-future-proofing.md
  • Gradata/docs/cloud/dashboard.md
  • Gradata/docs/cloud/overview.md
  • Gradata/docs/concepts/meta-rules.md
  • Gradata/docs/specs/cloud-sync-and-pricing.md
  • Gradata/hooks/hooks.json
  • Gradata/migrations/supabase/014_corrections_unique.sql
  • Gradata/migrations/supabase/015_events_unique.sql
  • Gradata/migrations/supabase/016_brains_last_used_at.sql
  • Gradata/migrations/supabase/README.md
  • Gradata/skills/core/session-start/SKILL.md
  • Gradata/src/gradata/__init__.py
  • Gradata/src/gradata/_atomic.py
  • Gradata/src/gradata/_brain_lock.py
  • Gradata/src/gradata/_brain_manifest.py
  • Gradata/src/gradata/_cloud_sync.py
  • Gradata/src/gradata/_config.py
  • Gradata/src/gradata/_config_paths.py
  • Gradata/src/gradata/_context_compile.py
  • Gradata/src/gradata/_context_packet.py
  • Gradata/src/gradata/_core.py
  • Gradata/src/gradata/_data_flow_audit.py
  • Gradata/src/gradata/_db.py
  • Gradata/src/gradata/_doctor.py
  • Gradata/src/gradata/_events.py
  • Gradata/src/gradata/_export_brain.py
  • Gradata/src/gradata/_fact_extractor.py
  • Gradata/src/gradata/_file_lock.py
  • Gradata/src/gradata/_http.py
  • Gradata/src/gradata/_installer.py
  • Gradata/src/gradata/_manifest_helpers.py
  • Gradata/src/gradata/_manifest_metrics.py
  • Gradata/src/gradata/_manifest_quality.py
  • Gradata/src/gradata/_migrations/001_add_tenant_id.py
  • Gradata/src/gradata/_migrations/002_add_event_identity.py
  • Gradata/src/gradata/_migrations/003_add_sync_state.py
  • Gradata/src/gradata/_migrations/__init__.py
  • Gradata/src/gradata/_migrations/_runner.py
  • Gradata/src/gradata/_migrations/_ulid.py
  • Gradata/src/gradata/_migrations/device_uuid.py
  • Gradata/src/gradata/_migrations/fill_null_tenant.py
  • Gradata/src/gradata/_migrations/tenant_uuid.py
  • Gradata/src/gradata/_mine_transcripts.py
  • Gradata/src/gradata/_paths.py
  • Gradata/src/gradata/_query.py
  • Gradata/src/gradata/_stats.py
  • Gradata/src/gradata/_tag_taxonomy.py
  • Gradata/src/gradata/_telemetry.py
  • Gradata/src/gradata/_tenant.py
  • Gradata/src/gradata/_text_utils.py
  • Gradata/src/gradata/_transcript.py
  • Gradata/src/gradata/_transcript_providers.py
  • Gradata/src/gradata/_types.py
  • Gradata/src/gradata/_validator.py
  • Gradata/src/gradata/_workers.py
  • Gradata/src/gradata/adapters/mem0.py
  • Gradata/src/gradata/audit.py
  • Gradata/src/gradata/brain.py
  • Gradata/src/gradata/brain_inspection.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/cloud/client.py
  • Gradata/src/gradata/cloud/sync.py
  • Gradata/src/gradata/context_wrapper.py
  • Gradata/src/gradata/contrib/enhancements/eval_benchmark.py
  • Gradata/src/gradata/contrib/enhancements/install_manifest.py
  • Gradata/src/gradata/contrib/enhancements/quality_gates.py
  • Gradata/src/gradata/contrib/enhancements/truth_protocol.py
  • Gradata/src/gradata/contrib/patterns/__init__.py
  • Gradata/src/gradata/contrib/patterns/agent_modes.py
  • Gradata/src/gradata/contrib/patterns/context_brackets.py
  • Gradata/src/gradata/contrib/patterns/evaluator.py
  • Gradata/src/gradata/contrib/patterns/execute_qualify.py
  • Gradata/src/gradata/contrib/patterns/guardrails.py
  • Gradata/src/gradata/contrib/patterns/human_loop.py
  • Gradata/src/gradata/contrib/patterns/loop_detection.py
  • Gradata/src/gradata/contrib/patterns/mcp.py
  • Gradata/src/gradata/contrib/patterns/memory.py
  • Gradata/src/gradata/contrib/patterns/middleware.py
  • Gradata/src/gradata/contrib/patterns/orchestrator.py
  • Gradata/src/gradata/contrib/patterns/parallel.py
  • Gradata/src/gradata/contrib/patterns/pipeline.py
  • Gradata/src/gradata/contrib/patterns/q_learning_router.py
  • Gradata/src/gradata/contrib/patterns/rag.py
  • Gradata/src/gradata/contrib/patterns/reconciliation.py
  • Gradata/src/gradata/contrib/patterns/reflection.py
  • Gradata/src/gradata/contrib/patterns/sub_agents.py
  • Gradata/src/gradata/contrib/patterns/task_escalation.py
  • Gradata/src/gradata/contrib/patterns/tools.py
  • Gradata/src/gradata/contrib/patterns/tree_of_thoughts.py
  • Gradata/src/gradata/correction_detector.py
  • Gradata/src/gradata/daemon.py
  • Gradata/src/gradata/detection/addition_pattern.py
  • Gradata/src/gradata/enhancements/_sanitize.py
  • Gradata/src/gradata/enhancements/bandits/collaborative_filter.py
  • Gradata/src/gradata/enhancements/bandits/contextual_bandit.py
  • Gradata/src/gradata/enhancements/behavioral_engine.py
  • Gradata/src/gradata/enhancements/causal_chains.py
  • Gradata/src/gradata/enhancements/cluster_manager.py
  • Gradata/src/gradata/enhancements/clustering.py
  • Gradata/src/gradata/enhancements/contradiction_detector.py
  • Gradata/src/gradata/enhancements/dedup.py
  • Gradata/src/gradata/enhancements/diff_engine.py
  • Gradata/src/gradata/enhancements/edit_classifier.py
  • Gradata/src/gradata/enhancements/freshness.py
  • Gradata/src/gradata/enhancements/git_backfill.py
  • Gradata/src/gradata/enhancements/graduation/agent_graduation.py
  • Gradata/src/gradata/enhancements/graduation/judgment_decay.py
  • Gradata/src/gradata/enhancements/graduation/rules_distillation.py
  • Gradata/src/gradata/enhancements/graduation/scoring.py
  • Gradata/src/gradata/enhancements/instruction_cache.py
  • Gradata/src/gradata/enhancements/learning_pipeline.py
  • Gradata/src/gradata/enhancements/lesson_discriminator.py
  • Gradata/src/gradata/enhancements/llm_provider.py
  • Gradata/src/gradata/enhancements/llm_synthesizer.py
  • Gradata/src/gradata/enhancements/memory_taxonomy.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/enhancements/meta_rules_storage.py
  • Gradata/src/gradata/enhancements/metrics.py
  • Gradata/src/gradata/enhancements/observation_hooks.py
  • Gradata/src/gradata/enhancements/pattern_extractor.py
  • Gradata/src/gradata/enhancements/pattern_integration.py
  • Gradata/src/gradata/enhancements/pipeline_rewriter.py
  • Gradata/src/gradata/enhancements/profiling/tone_profile.py
  • Gradata/src/gradata/enhancements/prompt_synthesizer.py
  • Gradata/src/gradata/enhancements/reporting.py
  • Gradata/src/gradata/enhancements/retrieval_fusion.py
  • Gradata/src/gradata/enhancements/router_warmstart.py
  • Gradata/src/gradata/enhancements/rule_canary.py
  • Gradata/src/gradata/enhancements/rule_context_bridge.py
  • Gradata/src/gradata/enhancements/rule_export.py
  • Gradata/src/gradata/enhancements/rule_integrity.py
  • Gradata/src/gradata/enhancements/rule_pipeline.py
  • Gradata/src/gradata/enhancements/rule_synthesizer.py
  • Gradata/src/gradata/enhancements/rule_to_hook.py
  • Gradata/src/gradata/enhancements/rule_verifier.py
  • Gradata/src/gradata/enhancements/scoring/brain_scores.py
  • Gradata/src/gradata/enhancements/scoring/calibration.py
  • Gradata/src/gradata/enhancements/scoring/correction_tracking.py
  • Gradata/src/gradata/enhancements/scoring/failure_detectors.py
  • Gradata/src/gradata/enhancements/scoring/gate_calibration.py
  • Gradata/src/gradata/enhancements/scoring/loop_intelligence.py
  • Gradata/src/gradata/enhancements/scoring/memory_extraction.py
  • Gradata/src/gradata/enhancements/scoring/reports.py
  • Gradata/src/gradata/enhancements/scoring/success_conditions.py
  • Gradata/src/gradata/enhancements/self_improvement/__init__.py
  • Gradata/src/gradata/enhancements/self_improvement/_confidence.py
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
  • Gradata/src/gradata/enhancements/similarity.py
  • Gradata/src/gradata/enhancements/skill_export.py
  • Gradata/src/gradata/events_bus.py
  • Gradata/src/gradata/exceptions.py
  • Gradata/src/gradata/graph.py
  • Gradata/src/gradata/hooks/_base.py
  • Gradata/src/gradata/hooks/_generated_runner_core.py
  • Gradata/src/gradata/hooks/_installer.py
  • Gradata/src/gradata/hooks/_profiles.py
  • Gradata/src/gradata/hooks/agent_graduation.py
  • Gradata/src/gradata/hooks/agent_precontext.py
  • Gradata/src/gradata/hooks/auto_correct.py
  • Gradata/src/gradata/hooks/brain_maintain.py
  • Gradata/src/gradata/hooks/claude_code.py
  • Gradata/src/gradata/hooks/client.py
  • Gradata/src/gradata/hooks/config_protection.py
  • Gradata/src/gradata/hooks/config_validate.py
  • Gradata/src/gradata/hooks/context_inject.py
  • Gradata/src/gradata/hooks/ctx_watchdog.py
  • Gradata/src/gradata/hooks/daemon.py
  • Gradata/src/gradata/hooks/dict
  • Gradata/src/gradata/hooks/dispatch_post.py
  • Gradata/src/gradata/hooks/duplicate_guard.py
  • Gradata/src/gradata/hooks/generated_runner.py
  • Gradata/src/gradata/hooks/generated_runner_post.py
  • Gradata/src/gradata/hooks/graph_first_check.py
  • Gradata/src/gradata/hooks/graph_session_track.py
  • Gradata/src/gradata/hooks/implicit_feedback.py
  • Gradata/src/gradata/hooks/inject_brain_rules.py
  • Gradata/src/gradata/hooks/jit_inject.py
  • Gradata/src/gradata/hooks/pre_compact.py
  • Gradata/src/gradata/hooks/rule_enforcement.py
  • Gradata/src/gradata/hooks/secret_scan.py
  • Gradata/src/gradata/hooks/self_review.py
  • Gradata/src/gradata/hooks/session_boot.py
  • Gradata/src/gradata/hooks/session_close.py
  • Gradata/src/gradata/hooks/session_persist.py
  • Gradata/src/gradata/hooks/stale_hook_check.py
  • Gradata/src/gradata/hooks/status_line.py
  • Gradata/src/gradata/hooks/telemetry_summary.py
  • Gradata/src/gradata/hooks/tool_failure_emit.py
  • Gradata/src/gradata/hooks/tool_finding_capture.py
  • Gradata/src/gradata/inspection.py
  • Gradata/src/gradata/integrations/anthropic_adapter.py
  • Gradata/src/gradata/integrations/langchain_adapter.py
  • Gradata/src/gradata/integrations/openai_adapter.py
  • Gradata/src/gradata/mcp_server.py
  • Gradata/src/gradata/mcp_tools.py
  • Gradata/src/gradata/middleware/__init__.py
  • Gradata/src/gradata/middleware/_core.py
  • Gradata/src/gradata/middleware/anthropic_adapter.py
  • Gradata/src/gradata/middleware/crewai_adapter.py
  • Gradata/src/gradata/middleware/langchain_adapter.py
  • Gradata/src/gradata/middleware/openai_adapter.py
  • Gradata/src/gradata/notifications.py
  • Gradata/src/gradata/onboard.py
  • Gradata/src/gradata/patterns/__init__.py
  • Gradata/src/gradata/rules/rule_context.py
  • Gradata/src/gradata/rules/rule_engine/__init__.py
  • Gradata/src/gradata/rules/rule_engine/_formatting.py
  • Gradata/src/gradata/rules/rule_engine/_scoring.py
  • Gradata/src/gradata/rules/rule_graph.py
  • Gradata/src/gradata/rules/rule_ranker.py
  • Gradata/src/gradata/rules/scope.py
  • Gradata/src/gradata/safety.py
  • Gradata/src/gradata/security/correction_hash.py
  • Gradata/src/gradata/security/correction_provenance.py
  • Gradata/src/gradata/security/manifest_signing.py
  • Gradata/src/gradata/sidecar/watcher.py
  • Gradata/tests/conftest.py
  • Gradata/tests/test_agent_graduation.py
  • Gradata/tests/test_brain_dir_required.py
  • Gradata/tests/test_brain_lock.py
  • Gradata/tests/test_bug_fixes.py
  • Gradata/tests/test_cloud_row_push.py
  • Gradata/tests/test_cloud_sync.py
  • Gradata/tests/test_cluster_injection.py
  • Gradata/tests/test_ctx_watchdog.py
  • Gradata/tests/test_doctor_cloud.py
  • Gradata/tests/test_emit_pii_redaction.py
  • Gradata/tests/test_graph_enforcement.py
  • Gradata/tests/test_hooks_intelligence.py
  • Gradata/tests/test_hooks_learning.py
  • Gradata/tests/test_implicit_feedback.py
  • Gradata/tests/test_import_integrity.py
  • Gradata/tests/test_inject_watchdog_phases.py
  • Gradata/tests/test_integration_workflow.py
  • Gradata/tests/test_lesson_applications.py
  • Gradata/tests/test_llm_synthesizer.py
  • Gradata/tests/test_mem0_adapter.py
  • Gradata/tests/test_meta_rule_generalization.py
  • Gradata/tests/test_meta_rules.py
  • Gradata/tests/test_migration_002_event_identity.py
  • Gradata/tests/test_migration_003_sync_state.py
  • Gradata/tests/test_multi_brain_simulation.py
  • Gradata/tests/test_no_bare_excepts.py
  • Gradata/tests/test_pipeline_e2e.py
  • Gradata/tests/test_pre_compact.py
  • Gradata/tests/test_rule_graph_atomic.py
  • Gradata/tests/test_rule_pipeline.py
  • Gradata/tests/test_rule_synthesizer.py
  • Gradata/tests/test_session_close_loop_state.py
  • Gradata/tests/test_skill_export.py
  • Gradata/tests/test_transcript.py

Comment thread Gradata/AGENTS.md
Comment on lines +3 to +6
> Guidance for AGENTS.md-aware coding agents (Hermes Agent, Codex, Gemini CLI, OpenCode, etc.) working in this repository.
>
> Claude Code reads `CLAUDE.md` instead — keep both files in sync when changing project-wide guidance.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -t f -name "CLAUDE.md" -o -name "AGENTS.md"

Repository: Gradata/gradata

Length of output: 290


🏁 Script executed:

cat -n Gradata/AGENTS.md | head -200

Repository: Gradata/gradata

Length of output: 8948


🏁 Script executed:

test -f Gradata/CLAUDE.md && echo "CLAUDE.md exists" || echo "CLAUDE.md does not exist"

Repository: Gradata/gradata

Length of output: 84


🏁 Script executed:

test -f Gradata/CLAUDE.md && cat -n Gradata/CLAUDE.md | head -100

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

git status Gradata/AGENTS.md

Repository: Gradata/gradata

Length of output: 125


🏁 Script executed:

git log --oneline -1 -- Gradata/AGENTS.md

Repository: Gradata/gradata

Length of output: 141


🏁 Script executed:

sed -n '75,86p; 100,108p; 173,182p' Gradata/AGENTS.md | cat -n

Repository: Gradata/gradata

Length of output: 1081


CLAUDE.md does not exist; either create it or remove the reference at lines 3–6.

The file declares that "Claude Code reads CLAUDE.md instead — keep both files in sync when changing project-wide guidance," but CLAUDE.md does not exist in the repository. This creates an unmet dependency. Either:

  • Create CLAUDE.md with the repo-wide agent guidance copied from this file, or
  • Remove the reference to CLAUDE.md from lines 3–6 if it is not actually used by Claude Code.

Additionally, the code blocks at lines 80 (layer diagram), 103 (lesson state diagram), and 175 (commit format) lack language labels and should specify bash, text, or appropriate alternatives for markdown linting compliance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/AGENTS.md` around lines 3 - 6, The AGENTS.md references a
non-existent CLAUDE.md and has unlabeled code blocks; either create a new
CLAUDE.md containing the same repo-wide agent guidance (copy contents of
AGENTS.md) or remove the “Claude Code reads `CLAUDE.md` instead” reference from
AGENTS.md so there’s no unmet dependency, and update the three code fences (the
layer diagram, the lesson state diagram, and the commit format blocks) to
include explicit language labels (e.g., bash or text) to satisfy markdown
linting; search for the string "CLAUDE.md" and the unlabeled diagram/commit code
blocks in AGENTS.md to apply the fixes.

Comment thread Gradata/AGENTS.md
Comment on lines +80 to +85
```
Layer 2 — Public API brain.py, cli.py, daemon.py, mcp_server.py
Layer 1 — Enhancements enhancements/*, rules/
Layer 0 — Primitives _types.py, _db.py, _events.py, _paths.py, _file_lock.py …
Layer 0 — Patterns contrib/patterns/*
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language tags to the fenced blocks.

markdownlint is already flagging these three fences. Labeling them keeps the new guide lint-clean and avoids adding permanent doc-noise to CI.

Also applies to: 103-107, 175-181

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 80-80: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/AGENTS.md` around lines 80 - 85, The fenced code blocks showing the
layer lists (e.g., the block containing "Layer 2 — Public API        brain.py,
cli.py, daemon.py, mcp_server.py" and the other similar blocks later) are
missing language tags; update those fenced blocks to include a language tag such
as ```text (apply the same change to the other two fenced blocks referenced in
the review) so markdownlint stops flagging them and the file remains lint-clean.

Comment thread Gradata/bench/pmr_100.py
Comment on lines +60 to +61
from gradata import Brain, Lesson, LessonState
from gradata._types import CorrectionType

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The benchmark never actually drives the advertised CorrectionType classes.

Scenario.correction_class is only copied into the result; run_one_session() never passes it into brain.correct() or any lower-level correction path. On top of that, the corpus comment says “24 scenarios across 6 classes,” but this file currently defines 25 scenarios and only five distinct class labels. That makes the per-class breakdown and the “across all 6 classes” acceptance gate non-actionable.

Also applies to: 78-80, 249-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/bench/pmr_100.py` around lines 60 - 61, Scenario.correction_class is
never used by the correction path and the scenario corpus/counts are
inconsistent; update run_one_session() to forward each Scenario.correction_class
into the correction path (either by passing it into brain.correct(...,
correction_class=...) or by constructing Lesson/ LessonState with the
correction_class so downstream correction uses it), ensure you reference
Scenario.correction_class, run_one_session(), brain.correct(), Lesson and
LessonState when making the change, and then fix the scenario definitions so
there are exactly 24 scenarios and six distinct correction class labels (or
update the corpus comment to match) so the per-class breakdown and “across all 6
classes” assertion is accurate.

Comment thread Gradata/bench/pmr_100.py
Comment on lines +265 to +272
# apply_brain_rules returns a formatted string (not a list of rule objects).
# We score by whether expected keywords appear in the rendered prompt block.
text_lower = rules_text.lower()
recall_at_1 = bool(rules_text) and any(kw.lower() in text_lower
for kw in scenario.expected_keywords)
# @3 collapses to @1 with a string return — kept for compatibility / future
# variant where we score per-rule chunks.
recall_at_3 = recall_at_1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

recall@3 is not a real metric in the current scorer.

Setting recall_at_3 = recall_at_1 means this benchmark can never distinguish top-1 retrieval from top-3 retrieval, so the documented ≥85% recall@3 ship gate is unverifiable. Either expose ranked matches from the probing API or parse/scored rule chunks before keeping a separate recall@3 field.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/bench/pmr_100.py` around lines 265 - 272, The scorer currently
collapses recall@3 to recall@1 because apply_brain_rules returns a single
formatted string (rules_text), so change the pipeline to produce ranked rule
matches (or parse into discrete rule chunks) and compute recall@3 from the top-3
ranked items: update apply_brain_rules (or its caller) to return a list/ordered
sequence of rule strings or match objects rather than a single concatenated
string, then replace the current boolean recall_at_1 and recall_at_3 logic with
checks that compute recall_at_1 = any(expected keyword in top-1) and recall_at_3
= any(expected keyword in top-3) using the returned list (reference symbols:
apply_brain_rules, rules_text, recall_at_1, recall_at_3,
scenario.expected_keywords).

Comment thread Gradata/bench/README.md
Comment on lines +35 to +40
Interpretation: `Brain.correct()` does not synchronously extract rules that
are returned by `apply_brain_rules()` on the first probe. Either the
extraction is async/queued (daemon-only), graduation thresholds prevent the
rule from being callable, or the matching path needs warm-up. **This is the
work.** Track this number on every PR. Ship when ≥70% recall@1 across all 6
correction classes — Anthropic-tier memory bench results.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the ship gate aligned with the full acceptance criteria.

This says to ship once recall@1 clears 70%, but the documented benchmark gate also includes recall@3 ≥85% and median session <2s. Leaving those out here will make future PMR-100 runs look acceptable when they are still below the stated bar.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/bench/README.md` around lines 35 - 40, The README currently states
the ship gate as "ship when ≥70% recall@1" but omits the other acceptance
criteria; update the Bench README text so the gate explicitly includes all three
metrics from the benchmark (recall@1 ≥70%, recall@3 ≥85%, and median session
<2s) and reference the relevant checks (e.g., Brain.correct(),
apply_brain_rules()) or the benchmark class names for the six correction classes
to ensure reviewers and automated PMR-100 runs validate all criteria together
rather than only recall@1.

Comment on lines +200 to +223
def _call_gemini(
model: str, system: str, user_prompt: str, max_tokens: int, timeout: float
) -> str | None:
key = os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY")
if not key:
_log.debug("synth: GOOGLE_API_KEY / GEMINI_API_KEY not set")
return None
try:
from google import genai
from google.genai import types as genai_types

client = genai.Client(api_key=key)
config = genai_types.GenerateContentConfig(
system_instruction=system,
max_output_tokens=max_tokens,
)
resp = client.models.generate_content(
model=model,
contents=user_prompt,
config=config,
)
text = resp.text
return text.strip() if text else None
except Exception as exc:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In the current google-genai Python SDK, what is the correct way to enforce a per-request timeout for client.models.generate_content(...), and which parameter(s) should be used?

💡 Result:

In the current google-genai Python SDK (googleapis/python-genai), the correct way to enforce a per-request timeout for client.models.generate_content(...) is to use the http_options parameter within the config argument. Use types.GenerateContentConfig with http_options=types.HttpOptions(timeout=milliseconds). Example: from google import genai from google.genai import types client = genai.Client(api_key='your-api-key') response = client.models.generate_content( model='gemini-2.5-flash', contents='Why is the sky blue?', config=types.GenerateContentConfig( http_options=types.HttpOptions(timeout=30000) # 30 seconds ) ) The timeout is specified in milliseconds. This applies the timeout specifically to that generate_content request. For a global timeout across all requests from the client, pass http_options to the Client constructor: client = genai.Client( api_key='your-api-key', http_options=types.HttpOptions(timeout=30000) ) Note: There are some reported issues with timeouts not always working as expected in certain scenarios (e.g., long generations or async streaming), but this is the documented and intended method per official SDK docs and examples.

Citations:


🏁 Script executed:

# First, let's examine the specific file and line range mentioned
cd Gradata && find . -name "rule_synthesizer.py" -type f

Repository: Gradata/gradata

Length of output: 106


🏁 Script executed:

# Read the specific lines from the file
head -230 src/gradata/enhancements/rule_synthesizer.py | tail -40

Repository: Gradata/gradata

Length of output: 163


🏁 Script executed:

# Read the specific lines from the file with correct path
head -230 Gradata/src/gradata/enhancements/rule_synthesizer.py | tail -40

Repository: Gradata/gradata

Length of output: 1298


Apply timeout to Gemini API requests using http_options.

The timeout parameter is accepted but ignored. The google-genai SDK requires passing http_options=types.HttpOptions(timeout=...) to GenerateContentConfig to enforce per-request timeouts. Without this, calls can hang indefinitely and distort benchmarks.

Update the GenerateContentConfig call to include:

http_options=genai_types.HttpOptions(timeout=int(timeout * 1000))  # timeout in milliseconds
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 200 - 223,
The _call_gemini function currently ignores the timeout parameter; update the
GenerateContentConfig invocation inside _call_gemini to pass http_options using
genai_types.HttpOptions with the timeout in milliseconds (e.g.,
http_options=genai_types.HttpOptions(timeout=int(timeout * 1000))), ensuring the
float seconds timeout argument is converted to ms and applied to the Gemini
request.

Comment on lines +326 to +329
cached = _read_cache(brain_dir, cache_key)
if cached:
_log.debug("synth cache hit: %s", cache_key)
return cached

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate cache contents before returning them.

On cache hit, the function returns raw cached text without re-validating <brain-wisdom> shape/length. A truncated or corrupted cache entry can bypass the fail-safe path and be injected as-is.

Suggested fix
     cached = _read_cache(brain_dir, cache_key)
     if cached:
-        _log.debug("synth cache hit: %s", cache_key)
-        return cached
+        cached_block = _extract_wisdom_block(cached)
+        if cached_block and len(cached_block) >= 50:
+            _log.debug("synth cache hit: %s", cache_key)
+            return cached_block
+        _log.debug("synth cache invalid, regenerating: %s", cache_key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 326 - 329,
The cached value returned from _read_cache(brain_dir, cache_key) must be
validated before returning: inspect the cached string (variable cached) to
ensure it contains a well-formed <brain-wisdom> block and expected length/fields
(use the same validation/parsing logic used elsewhere in rule_synthesizer, or
implement a small validator function like validate_brain_wisdom that checks tags
and minimum length); if validation fails, log a warning via
_log.warning/_log.debug, discard the cached entry (do not return it), and fall
through to the normal synthesis/fail-safe path (optionally remove or invalidate
the cache file). Ensure this change is applied where the cache hit is handled so
_read_cache and the calling code (the block using cached, cache_key, brain_dir)
never return corrupted/truncated wisdom.

Comment on lines +17 to 20
logger = logging.getLogger(__name__)


_log = logging.getLogger(__name__)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate logger definitions — use _log consistently.

Line 17 introduces logger = logging.getLogger(__name__) but line 20 already defines _log = logging.getLogger(__name__). This creates two identical logger references. Line 745 uses logger.warning(...) while the rest of the file uses _log.debug(...).

Pick one name and use it consistently throughout the module.

🔧 Proposed fix
-logger = logging.getLogger(__name__)
-
-
-_log = logging.getLogger(__name__)
+_log = logging.getLogger(__name__)

And at line 745:

-        logger.warning('Suppressed exception in _log_outcome', exc_info=True)
+        _log.warning('Suppressed exception in _log_outcome', exc_info=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_to_hook.py` around lines 17 - 20, There
are duplicate logger variables (logger and _log); choose one name and use it
everywhere—prefer using the existing _log variable (defined as _log =
logging.getLogger(__name__)); remove the redundant logger =
logging.getLogger(...) definition and replace all usages of logger (e.g., the
logger.warning(...) call near the current line 745) with _log.warning(...) so
the module consistently uses _log for logging.

max_row = conn.execute(
"SELECT COALESCE(MAX(session), 0) FROM events"
).fetchone()
max_row = conn.execute("SELECT COALESCE(MAX(session), 0) FROM events").fetchone()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider breaking this line for improved readability.

This line exceeds the PEP 8 recommended 79-character limit (~97 characters). Multi-line formatting would improve readability, especially for SQL queries.

♻️ Suggested formatting
-        max_row = conn.execute("SELECT COALESCE(MAX(session), 0) FROM events").fetchone()
+        max_row = conn.execute(
+            "SELECT COALESCE(MAX(session), 0) FROM events"
+        ).fetchone()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
max_row = conn.execute("SELECT COALESCE(MAX(session), 0) FROM events").fetchone()
max_row = conn.execute(
"SELECT COALESCE(MAX(session), 0) FROM events"
).fetchone()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/scoring/brain_scores.py` at line 104, The
single long SQL exec line exceeds PEP8 width; refactor the call in the scope
where conn.execute is used (around the variable max_row) by splitting the SQL
into a separate multi-line string or using implicit string
concatenation/parentheses and then pass that variable to conn.execute (e.g.,
query = ("SELECT COALESCE(MAX(session), 0) " "FROM events"); max_row =
conn.execute(query).fetchone()), keeping the variable name max_row and the
conn.execute(...).fetchone() call intact.

system_health = round(
min(100.0, (gate_sessions / total_sessions) * 100.0), 1
)
system_health = round(min(100.0, (gate_sessions / total_sessions) * 100.0), 1)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider breaking this expression across multiple lines.

This line (~95 characters) exceeds PEP 8's 79-character recommendation and contains nested operations that would be clearer when formatted across multiple lines.

♻️ Suggested formatting
-            system_health = round(min(100.0, (gate_sessions / total_sessions) * 100.0), 1)
+            system_health = round(
+                min(100.0, (gate_sessions / total_sessions) * 100.0), 1
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
system_health = round(min(100.0, (gate_sessions / total_sessions) * 100.0), 1)
system_health = round(
min(100.0, (gate_sessions / total_sessions) * 100.0), 1
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/scoring/brain_scores.py` at line 123, The
single-line calculation for system_health is too long and hard to read; refactor
the expression by introducing intermediate values or breaking the parenthesized
expression across multiple lines: compute the ratio (gate_sessions /
total_sessions), compute the percentage (ratio * 100.0) and apply min(100.0,
...), then round that result and assign to system_health; target the assignment
to system_health and the operands gate_sessions and total_sessions in your
change (or wrap the existing expression in parentheses with line breaks) to
satisfy PEP8 line-length and improve clarity.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +1 to +26
"""
Gradata Dashboard — Your AI's fitness tracker.
===============================================
Run: streamlit run C:/Users/olive/SpritesWork/brain/scripts/dashboard.py
"""

import json
import re
import sqlite3
from datetime import datetime
from pathlib import Path

import pandas as pd
import plotly.graph_objects as go
import streamlit as st

# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
BRAIN_DIR = Path("C:/Users/olive/SpritesWork/brain")
DB_PATH = BRAIN_DIR / "system.db"
EVENTS_PATH = BRAIN_DIR / "events.jsonl"
LESSONS_PATH = BRAIN_DIR / "lessons.md"
PROSPECTS_DIR = BRAIN_DIR / "prospects"
BRIEF_PATH = BRAIN_DIR / "morning-brief.md"
TASKS_DIR = Path("C:/Users/olive/.claude/scheduled-tasks")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop this archived local-only file from the PR.

This file is committed under .archive/ and hard-codes one developer's Windows paths/user name (C:/Users/olive/...). That leaks local environment details into the repo and makes the dashboard unusable anywhere else without manual edits.

Based on learnings: Never commit scratch files (.tmp/, .archive/, sessions/handoff-*.md, files named 0 or BrainDetail); these belong in .gitignore.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines 1
- 26, Remove this archived local-only file from the PR and the repository
history: delete Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py
from the commit (or move it out of the repo to a local backup) and ensure no
other committed files contain hard-coded paths like BRAIN_DIR, DB_PATH,
EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR, BRIEF_PATH or TASKS_DIR that leak a
developer's local Windows path. Add .archive/ and common scratch patterns (e.g.,
.tmp/, sessions/handoff-*.md, files named 0, BrainDetail) to .gitignore so these
files are not committed in future, and run git rm --cached for any
already-tracked instances before committing the updated .gitignore.

Comment on lines +212 to +214
if st.sidebar.button("Refresh"):
st.cache_resource.clear()
st.rerun()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Refresh does not invalidate the event projections.

The button clears only st.cache_resource(), but the learning panels are backed by @st.cache_data(ttl=60). After clicking Refresh, those views can still show stale events.jsonl data until the TTL expires.

Suggested fix
 if st.sidebar.button("Refresh"):
+    st.cache_data.clear()
     st.cache_resource.clear()
     st.rerun()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines
212 - 214, The Refresh handler only clears st.cache_resource but not the cached
learning-panel data which uses `@st.cache_data`(ttl=60); update the button
callback (the st.sidebar.button block) to also clear the cached data by calling
st.cache_data.clear() (or the appropriate Streamlit cache_data clearing API)
before calling st.rerun() so event projections backed by `@st.cache_data` are
invalidated immediately.

Comment thread Gradata/bench/pmr_100.py
Comment on lines +342 to +343
"median_session_seconds": round(
sorted(r.elapsed_seconds for r in results)[len(results) // 2] if results else 0, 3),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The reported median session time is wrong for even-sized runs.

For the default 100-session run, this picks the upper middle element instead of the median, so the < 2s median session acceptance check can be materially misreported.

Suggested fix
+    elapsed = sorted(r.elapsed_seconds for r in results)
     summary = {
         "total_sessions": total,
         "rules_extracted_pct": round(100 * extracted / total, 1) if total else 0.0,
         "recall_at_1_pct": round(100 * r1 / total, 1) if total else 0.0,
         "recall_at_3_pct": round(100 * r3 / total, 1) if total else 0.0,
         "errors": len(errors),
         "median_session_seconds": round(
-            sorted(r.elapsed_seconds for r in results)[len(results) // 2] if results else 0, 3),
+            (
+                elapsed[total // 2]
+                if total % 2 == 1
+                else (elapsed[total // 2 - 1] + elapsed[total // 2]) / 2
+            )
+            if elapsed
+            else 0,
+            3,
+        ),
         "by_class": dict(by_class),
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/bench/pmr_100.py` around lines 342 - 343, The median calculation for
"median_session_seconds" incorrectly picks the upper-middle element for
even-sized "results"; update the computation to compute the true median by
sorting the list of r.elapsed_seconds and if the count is odd take the middle
element, otherwise average the two middle elements (or simply use
statistics.median on the list) and then round to 3 decimals so
"median_session_seconds" reflects the correct median for both even and odd
lengths.

Comment on lines +48 to +59
# Aliases — accept the Supabase-native env var names too, so a single .env
# works for both the cloud backend service and the SDK push path.
ENV_URL_ALIAS: Final[str] = "GRADATA_SUPABASE_URL"
ENV_KEY_ALIAS: Final[str] = "GRADATA_SUPABASE_SERVICE_KEY"


def _env_url() -> str:
return os.environ.get(ENV_URL) or os.environ.get(ENV_URL_ALIAS) or ""


def _env_key() -> str:
return os.environ.get(ENV_KEY) or os.environ.get(ENV_KEY_ALIAS) or ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does Supabase's service_role/service key bypass Row Level Security, and is it recommended to expose that key in end-user or distributed client applications?

💡 Result:

Yes, Supabase's service_role key (or the equivalent secret key) bypasses Row Level Security (RLS), granting full access to the database. It is not recommended to expose this key in end-user or distributed client applications, as it should only be used in secure server-side environments.

Citations:


🏁 Script executed:

# Read the full _cloud_sync.py file to understand its purpose
wc -l Gradata/src/gradata/_cloud_sync.py

Repository: Gradata/gradata

Length of output: 98


🏁 Script executed:

# Get file size first, then read relevant sections
head -100 Gradata/src/gradata/_cloud_sync.py

Repository: Gradata/gradata

Length of output: 3426


🏁 Script executed:

# Search for how ENV_KEY and ENV_KEY_ALIAS are documented/used
rg -B 5 -A 5 "ENV_KEY|ENV_URL" Gradata/src/gradata/_cloud_sync.py | head -100

Repository: Gradata/gradata

Length of output: 878


🏁 Script executed:

# Look for module docstring and imports to understand the purpose
head -50 Gradata/src/gradata/_cloud_sync.py

Repository: Gradata/gradata

Length of output: 1918


🏁 Script executed:

# Search the codebase for references to _cloud_sync to understand how this module is used
rg "_cloud_sync|from.*_cloud_sync" Gradata/src/gradata --type py -A 2

Repository: Gradata/gradata

Length of output: 3552


🏁 Script executed:

# Check if there's a README or docs explaining the architecture
find Gradata -name "README*" -o -name "*.md" | head -10 | xargs grep -l "cloud" 2>/dev/null

Repository: Gradata/gradata

Length of output: 118


Do not accept a Supabase service key in the SDK sync path.

This module is client-side cloud sync that explicitly documents expecting GRADATA_CLOUD_KEY as a "Supabase anon key (RLS-scoped to tenant)" and relies on RLS for tenant isolation. Accepting GRADATA_SUPABASE_SERVICE_KEY as a fallback is a security boundary violation: service keys bypass Row Level Security entirely, converting a tenant-scoped push path into a cross-tenant write path if misconfigured.

Suggested fix
-ENV_KEY_ALIAS: Final[str] = "GRADATA_SUPABASE_SERVICE_KEY"
-
-
 def _env_key() -> str:
-    return os.environ.get(ENV_KEY) or os.environ.get(ENV_KEY_ALIAS) or ""
+    return os.environ.get(ENV_KEY) or ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_cloud_sync.py` around lines 48 - 59, The code currently
accepts a Supabase service key fallback via ENV_KEY_ALIAS
(GRADATA_SUPABASE_SERVICE_KEY) which is a security risk; update _env_key to only
return the client/anon key (ENV_KEY, i.e. GRADATA_CLOUD_KEY) and stop reading
ENV_KEY_ALIAS, and remove or stop referencing ENV_KEY_ALIAS (and its declaration
ENV_KEY_ALIAS) so the SDK sync path cannot accept service keys; ensure only
ENV_KEY (GRADATA_CLOUD_KEY) is consulted and fallback behavior is limited to an
empty string if unset.

Comment on lines +516 to +535
transformed = []
for r in rows:
try:
transformed.append(_transform_row(table, r, tenant_id))
except Exception as exc:
_log.warning("cloud_sync: skipping malformed row in %s: %s", table, exc)
all_ok = False
if not transformed:
continue
accepted, error = _post(table, transformed)
pushed[table] = accepted
if accepted != len(rows):
if error is not None and last_error is None:
last_error = {**error, "table": table}
if accepted != len(transformed):
all_ok = False
if pushed and all_ok:
_mark_push(conn, tenant_id, started)
_clear_push_error(brain)
elif last_error is not None:
_record_push_error(brain, last_error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Transformation failures still create a silent infinite retry loop.

When _transform_row() throws, this path sets all_ok = False and leaves the watermark unchanged, but last_error remains None, so cloud_push_error.json is never written. The same bad row will then block every subsequent push while gradata doctor stays blind to the root cause.

Suggested fix
             for r in rows:
                 try:
                     transformed.append(_transform_row(table, r, tenant_id))
                 except Exception as exc:
                     _log.warning("cloud_sync: skipping malformed row in %s: %s", table, exc)
+                    if last_error is None:
+                        last_error = {
+                            "code": 0,
+                            "message": f"transform failed: {type(exc).__name__}",
+                            "constraint_violation": False,
+                            "table": table,
+                        }
                     all_ok = False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_cloud_sync.py` around lines 516 - 535, When a call to
_transform_row(table, r, tenant_id) raises, add logic in the except block to set
last_error (if not already set) to include the exception details, table name and
identifying row info so the failure is recorded; keep setting all_ok = False and
continue, then let the existing branch call _record_push_error(brain,
last_error) so the bad row is written to cloud_push_error.json for diagnostics
(update the except block that currently only logs to _log.warning to also assign
last_error when missing).

Comment on lines +1405 to +1409
with sqlite3.connect(db_path) as conn:
rows = conn.execute(
"SELECT data_json FROM events WHERE type = 'CORRECTION' AND session = ?",
(session_number,),
).fetchall()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope the session correction query by tenant_id.

cloud_sync_tick() reads every CORRECTION row for the session number, but earlier fixes in this repo already had to add tenant-scoped event queries to avoid cross-tenant bleed. If a shared or imported DB contains multiple tenants, this will attribute the wrong corrections to the current brain and can leak another tenant's telemetry upstream.

Suggested fix
         import json as _json
         import sqlite3
         from pathlib import Path as _Path
 
+        from gradata._tenant import tenant_for as _tenant_for
+
         bd = _Path(brain_dir)
         if not bd.is_dir():
             return
+        tenant_id = _tenant_for(bd)
 
         all_lessons: list[Lesson] = []
@@
                 with sqlite3.connect(db_path) as conn:
                     rows = conn.execute(
-                        "SELECT data_json FROM events WHERE type = 'CORRECTION' AND session = ?",
-                        (session_number,),
+                        "SELECT data_json FROM events "
+                        "WHERE tenant_id = ? AND type = 'CORRECTION' AND session = ?",
+                        (tenant_id, session_number),
                     ).fetchall()

Based on learnings: _events.py previously needed a tenant-scoped fallback SELECT for dedup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_core.py` around lines 1405 - 1409, The SELECT in
cloud_sync_tick reads all CORRECTION rows for a session but is missing tenant
scoping; modify the query in the block that opens sqlite3.connect(db_path)
(inside cloud_sync_tick in _core.py) to include "AND tenant_id = ?" (or the
correct tenant column name used elsewhere) and add the current tenant identifier
to the parameter tuple so the query only returns corrections for the current
tenant/session pair, mirroring the tenant-scoped fallback used in _events.py.

Comment on lines +48 to +61
if self._session_id:
for d in all_dirs:
candidate = d / f"{self._session_id}.jsonl"
if candidate.is_file():
return candidate

# Fallback: most-recently modified JSONL across all project dirs.
all_jsonls: list[Path] = []
for d in all_dirs:
try:
all_jsonls.extend(f for f in d.iterdir() if f.suffix == ".jsonl")
except OSError:
continue
return max(all_jsonls, key=lambda p: p.stat().st_mtime) if all_jsonls else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not fall back to the newest Claude transcript globally.

If the exact {session_id}.jsonl file is missing, _locate() currently grabs the most recently modified transcript under ~/.claude/projects/. On a machine with multiple active Claude sessions, that can bind another project's conversation to this brain and mine unrelated corrections into local rules. Fail closed here unless the caller can provide an exact session file.

Suggested fix
-        # Fallback: most-recently modified JSONL across all project dirs.
-        all_jsonls: list[Path] = []
-        for d in all_dirs:
-            try:
-                all_jsonls.extend(f for f in d.iterdir() if f.suffix == ".jsonl")
-            except OSError:
-                continue
-        return max(all_jsonls, key=lambda p: p.stat().st_mtime) if all_jsonls else None
+        return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self._session_id:
for d in all_dirs:
candidate = d / f"{self._session_id}.jsonl"
if candidate.is_file():
return candidate
# Fallback: most-recently modified JSONL across all project dirs.
all_jsonls: list[Path] = []
for d in all_dirs:
try:
all_jsonls.extend(f for f in d.iterdir() if f.suffix == ".jsonl")
except OSError:
continue
return max(all_jsonls, key=lambda p: p.stat().st_mtime) if all_jsonls else None
if self._session_id:
for d in all_dirs:
candidate = d / f"{self._session_id}.jsonl"
if candidate.is_file():
return candidate
return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_transcript_providers.py` around lines 48 - 61, In
_locate(), stop the global "newest JSONL" fallback: if self._session_id is set
but no exact candidate file is found among all_dirs, return None (fail closed)
instead of scanning all_dirs for the most-recently modified JSONL; remove or
disable the all_jsonls/max(...) fallback logic so the method only returns an
exact session match (use symbols _locate, self._session_id, all_dirs) and
otherwise returns None.

@Gradata

Gradata commented May 1, 2026

Copy link
Copy Markdown
Owner Author

Superseded by clean rebase rebase/pmr-100-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant