Skip to content

Pre-launch ultra-review: meta-rules port, legacy-gate cleanup, graduation bug fix#126

Closed
Gradata wants to merge 16 commits into
mainfrom
pre-launch-ultra-review
Closed

Pre-launch ultra-review: meta-rules port, legacy-gate cleanup, graduation bug fix#126
Gradata wants to merge 16 commits into
mainfrom
pre-launch-ultra-review

Conversation

@Gradata

@Gradata Gradata commented Apr 21, 2026

Copy link
Copy Markdown
Owner

Summary

  • Port meta-rule discovery to local-first (removes cloud-gate vestiges): category grouping + greedy semantic-similarity clustering, zombie filter for RULE lessons below 0.90, decay after 20 sessions, count/(count+3) confidence smoothing.
  • Fix real bug in agent_graduation: MISFIRE_PENALTY = -0.15 was being subtracted (double-negative added confidence on rejection). Previously masked as an xfail with "API drift, reconcile in v0.7". Now aligned with canonical _confidence.py usage.
  • Aggressive skip/xfail cleanup: from 22 skipped + 2 xfailed → 3 skipped + 0 xfailed.

Commits (15)

  • meta_rules local-first port + unskip cloud-gated tests
  • pipeline_e2e: remove stale "discover_meta_rules not yet implemented" skips
  • agent_graduation MISFIRE_PENALTY sign fix + drop both xfails
  • Integration-workflow tests: drop bogus API-key guard (they only test local brain ops)
  • Delete redundant/exploration-only tests (test_real_mem0_roundtrip, test_with_real_data)
  • Docs truth-pass for cloud-vs-SDK boundary (earlier in chain)
  • Implicit-feedback text-speak coverage (r/u/dont/cant)
  • Doctor: cloud-health probing
  • Pipeline: canonical graduation + persistent brain_prompt + two-provider synth
  • Cloud sync: schema transform + JSONB scrub
  • Plus: add docs/LEGACY_CLEANUP.md tracker

Ultra-review status

Green across 4 parallel reviewers (silent-failure-hunter, code-reviewer, pr-test-analyzer, leak-scanner) prior to commit. Private strategy docs redacted from the public surface in an earlier commit.

Test plan

  • Full suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed (3m 19s local)
  • 3 remaining skips are fcntl POSIX paths in test_file_lock.py — physically cannot run on Windows, complementary Windows paths skip on Linux. Full coverage across the CI matrix.
  • CI to confirm cross-platform green before merge

Generated with Gradata

Gradata and others added 15 commits April 20, 2026 15:16
Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json`
vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns`
-> `corrections`). Added `_transform_row` per-table mapper with deterministic
uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone
UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each
batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's
opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` /
`GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both
backend and SDK.

Co-Authored-By: Gradata <noreply@gradata.ai>
…provider synth

Phase 1 of the learning-pipeline revamp. Rule graduation now flows through
the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN,
>= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline.
Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED
header, regenerated only at session_close after the pipeline fires. LLM
synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with
claude CLI fallback (Max-plan OAuth) so users without an exportable key
still get synthesis. Meta-rule deterministic fallback now warns loudly
instead of silently discarding. Drops five env-flag gates in favour of
file-based signals.

Co-Authored-By: Gradata <noreply@gradata.ai>
Adds --cloud / --no-cloud flags to the doctor CLI command and the
underlying diagnose() function. Flips the default cloud endpoint to
api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py
(all passing).

Co-Authored-By: Gradata <noreply@gradata.ai>
Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").

Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
  "how come", "you missed/forgot/failed/didnt"

All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.

Co-Authored-By: Gradata <noreply@gradata.ai>
14 new tests pinning the regex expansion from 5a6da45. Covers real
corrections observed this session ("Why r you not asking council",
"Why flag.. we don't skip we do work") plus shorthand cases
(dont / cant / again / you missed / how come). Dual-signal cases
assert both types detect. Full suite: 37 passed, 1 pre-existing skip.

Co-Authored-By: Gradata <noreply@gradata.ai>
Five post-launch metrics with precise definitions (activation, D7
retention, time-to-first-graduation, free->Pro conversion,
correction-rate decay). Numeric triggers: pivot <20% activation +
flat decay at D30; kill <100 installs at D60; scale >1K installs +
>=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8
of the pre-launch gap analysis.

Co-Authored-By: Gradata <noreply@gradata.ai>
The source-provenance docstring referenced "cloud-side LLM synthesis"
which is stale since the graduation-cloud-gate was removed. Synthesis
runs on the user's machine via rule_synthesizer.py's two-provider path
(Anthropic SDK with user's key, or Claude Code Max CLI OAuth).

Co-Authored-By: Gradata <noreply@gradata.ai>
Graduation and meta-rule LLM synthesis run entirely locally as of a
few sessions ago (rule_synthesizer.py uses user's own Anthropic key or
Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly
still claimed "cloud runs better graduation engine" and implied a
cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy
paragraph to match reality: free is functionally complete; Pro is
visualization, history, export, and the future community corpus.

NOTE: this file is listed in .gitignore per the earlier
"untrack private files" cleanup. Force-added at request.

Co-Authored-By: Gradata <noreply@gradata.ai>
Test was checking the pre-transform local key name. _cloud_sync._transform_row
correctly emits brain_id (cloud schema) from tenant_id (local schema); the
assertion was stale.

Co-Authored-By: Gradata <noreply@gradata.ai>
Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.

Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
  members included), storing {category, description, task} in context so
  session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
    REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
    session shares the lesson's category (or description substring).
    CONFIRMED otherwise (rule survived the session).

Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.

Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.

Co-Authored-By: Gradata <noreply@gradata.ai>
Sweeps the remaining docs that still claimed cloud gated any part of
the learning loop. Actual architecture (as of the graduation-local
pivot):

  Local SDK owns: correction capture, graduation, meta-rule clustering
  AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth),
  rule-to-hook promotion, manifest computation.

  Cloud owns: dashboard/visualization, cross-device sync, team brains,
  managed backups, future opt-in corpus donation.

Files touched:
- docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance.
- docs/architecture/cloud-monolith-v2.md — cloud-side workload framing.
- docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow.
- docs/concepts/meta-rules.md — synthesis is local, not cloud-gated.
- docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize.

README.md was already accurate; no changes there.

Co-Authored-By: Gradata <noreply@gradata.ai>
Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
  and escalate OperationalError to warning (missing-table surfaces).

Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
  no longer propagates and kills the whole push batch.

Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
  from the public repo; add both to .gitignore. These contain kill
  triggers, pricing, and PII that belong in the private brain vault only.

Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
  the detail field, restoring the documented status vocabulary
  ({ok, warn, fail, skip, missing, error}).

Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
  input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
  test_lesson_applications.py.

Verification: full suite 3802 pass, 22 skip, 2 xfailed.
Gradata is fully local-first now. Cloud-gate stubs and "requires cloud"
skip markers were legacy artifacts from an earlier architecture where
discovery/synthesis lived server-side. This commit finishes the port:

- meta_rules.discover_meta_rules + merge_into_meta run locally:
  category grouping + greedy semantic-similarity clustering, zombie
  filter on RULE-state lessons below 0.90, decay after 20 sessions,
  count/(count+3) confidence smoothing.
- Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer,
  test_meta_rule_generalization, test_multi_brain_simulation,
  test_pipeline_e2e. These tests now exercise the local impl directly.
- Retire the api_key-kwarg-on-merge_into_meta path (session-close
  rule_synthesizer drives LLM distillation now).
- Update fixtures to realistic prose so they survive the noise filter
  that rejects "cut:/added:" edit-distance summaries.
- Bump test_meta_rules confidence assertion to the smoothed formula.
- Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges
  (deprecated adapter shims, cloud docs, stale module docstrings).

Suite: 3809 passed, 14 skipped, 2 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>
…xtures

discover_meta_rules is implemented now (local-first). The
  if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.

Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).

Suite: 3813 passed, 10 skipped, 2 xfailed.

Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env

xfails (2) are tracked for v0.7 reconciliation in test docstring.

Co-Authored-By: Gradata <noreply@gradata.ai>
Found while clearing remaining skipped/xfailed tests:

Bug: agent_graduation._update_lesson_confidence had
  confidence = max(0.0, confidence - MISFIRE_PENALTY)
but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added
confidence on rejection. Test test_rejection_decreases_confidence was
xfail'd with 'API drift, reconcile in v0.7' — it was a real bug.

Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY).

Other cleanups in the same pass:

- test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern
  was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson
  graduates straight to RULE (stronger than PATTERN). Accept either state.
- test_integration_workflow: delete stale module-level skipif guarding 5
  tests behind ANTHROPIC/OPENAI keys they never actually use. They only
  exercise local brain.correct/convergence/efficiency — no network.
- test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test
  already covered by the 20+ fake-client tests in the same file).
- test_meta_rules: delete test_with_real_data — dev-time exploration
  script with zero asserts, requiring GRADATA_LESSONS_PATH env var.

Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed.

Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl,
which does not exist on Windows. Complementary Windows paths skip on
Linux — running on each platform covers all 4. Cannot be eliminated.

From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Apr 21, 2026

Copy link
Copy Markdown
📝 Walkthrough
  • Meta-rule discovery: ported to a local-first deterministic pipeline — category grouping, greedy semantic-similarity sub-clustering, zombie filter for RULE lessons below 0.90, session decay (non-mutating), and confidence smoothing (count/(count+3)); removes cloud-gate dependency.
  • Bug fix: agent_graduation MISFIRE_PENALTY corrected — rejection now decreases confidence via max(0.0, confidence + MISFIRE_PENALTY) (fixes previous double-negative behavior).
  • New public API: synthesize_rules_block(...) added — builds cached blocks with Anthropic SDK → Claude CLI fallback and fails gracefully (returns None) when providers are unavailable.
  • Cloud sync hardening: deterministic UUID mapping, per-row transform with table remap, JSON payload scrub (NUL/UTF-16 surrogate removal), in-batch dedupe, and POST batch chunking (500); push() now compares acceptance to transformed row count.
  • CLI/doctor enhancements: diagnose(...) signature extended to (brain_dir, include_cloud: bool = True, cloud_only: bool = False) and doctor command gains --cloud / --no-cloud flags; added cloud connectivity/auth/data probes.
  • Persistent brain prompt & injection workflow: brain_prompt.md is generated/ refreshed at session_close via synthesized wisdom blocks; inject_brain_rules supports persistent brain_prompt and records lesson_applications PENDING rows.
  • Lesson applications audit trail: lesson_applications rows created on injection and resolved (PENDING→CONFIRMED/REJECTED) at session_close using session events (CORRECTION, IMPLICIT_FEEDBACK, RULE_FAILURE).
  • Tests & docs: aggressive cleanup — many cloud-gated tests unskipped, new tests for doctor/cloud, synthesizer, lesson_applications; reduced skips/xfails to 3 skipped/0 xfailed; docs reposition cloud as visualization/dashboard only and record legacy cleanup in LEGACY_CLEANUP.md.
  • Breaking/API notes: diagnose(...) signature changed (breaking for callers relying on old signature). No other public API signatures were introduced as breaking changes; synthesize_rules_block is a new public function. No security fixes called out.

Walkthrough

Reframes the project to local-first execution (graduation, synthesis, promotion run in the SDK), removes cloud-gating semantics, adds local meta-rule synthesis and rule-synthesizer, hardens cloud sync and diagnostics, extends hook injection/session resolution with lesson_applications, and broadens test coverage to validate the local flows.

Changes

Cohort / File(s) Summary
Docs & Policy
Gradata/docs/architecture/cloud-monolith-v2.md, Gradata/docs/architecture/multi-tenant-future-proofing.md, Gradata/docs/cloud/overview.md, Gradata/docs/cloud/dashboard.md, Gradata/docs/concepts/meta-rules.md, Gradata/docs/LEGACY_CLEANUP.md, .gitignore
Rewrote documentation to state SDK is local-first; cloud is visualization/sharing/backup only. Added legacy-cleanup checklist and ignored internal pre-launch doc.
Meta-rule Localization
Gradata/src/gradata/enhancements/meta_rules.py, Gradata/src/gradata/enhancements/rule_synthesizer.py
Implemented deterministic local meta-rule discovery/clustering and a cached rule-synthesizer (synthesize_rules_block) with Anthropic/CLI fallbacks and safe caching.
Rule Pipeline & Graduation
Gradata/src/gradata/enhancements/rule_pipeline.py, Gradata/src/gradata/enhancements/graduation/agent_graduation.py
Added canonical graduation step and change-detection promotion; adjusted rejection confidence update logic (+MISFIRE_PENALTY).
Hooks: Injection & Session Close
Gradata/src/gradata/hooks/inject_brain_rules.py, Gradata/src/gradata/hooks/session_close.py
Added brain_prompt persistence/early-return, lesson->rule dict with session-aware recency, write lesson_applications PENDING rows on injection, added resolution of pending applications and automatic brain_prompt refresh after session close.
Implicit Feedback
Gradata/src/gradata/hooks/implicit_feedback.py
Extended regex patterns for negation, reminders, challenges, and shorthand variants used for implicit-signal detection.
Cloud Sync & Client
Gradata/src/gradata/_cloud_sync.py, Gradata/src/gradata/cloud/client.py
Hardening: env-var alias helpers, table remapping, deterministic UUIDs, per-table transforms, payload scrubbing, batch splitting (500), in-batch dedupe; updated default cloud API base URL.
Diagnostics & CLI
Gradata/src/gradata/_doctor.py, Gradata/src/gradata/cli.py
Added cloud probes (config/env, TCP, auth, data-visibility) gated by new include_cloud/cloud_only flags, CLI flags --cloud/--no-cloud.
Tests: New & Updated
Gradata/tests/* (new: test_rule_synthesizer.py, test_implicit_feedback.py, test_doctor_cloud.py, test_lesson_applications.py; many existing tests updated to remove cloud skips / adjust assertions)
Added tests for synthesizer, implicit feedback, cloud doctor checks, lesson_applications; removed cloud gating/skips and updated assertions/confidence expectations across many test modules.
Misc (formatting/whitespace)
Gradata/tests/conftest.py, Gradata/src/gradata/enhancements/..., other files
Various formatting, reflowing, and small refactors without semantic changes beyond those noted above.

Sequence Diagram(s)

sequenceDiagram
    participant Brain as Brain/<br/>lessons.md
    participant Injector as inject_brain_rules<br/>(hook)
    participant DB as system.db<br/>(lesson_applications)
    participant Session as session_close<br/>(hook)
    participant Synth as rule_synthesizer
    participant Events as Session<br/>Events

    Brain->>Injector: Read lessons & current_session
    Injector->>Injector: Transform lessons -> rule dicts
    Injector->>Injector: Check brain_prompt.md (AUTO-GENERATED)
    alt brain_prompt exists
        Injector-->>Brain: Return brain_prompt (early)
    else
        Injector->>Synth: Build rules block (synthesize_rules_block / cache)
        Synth-->>Injector: Return <brain-wisdom> or None
        Injector-->>Brain: Return composed rules block
    end
    Injector->>DB: Insert lesson_applications rows (PENDING)
    Note over Events,DB: End of session
    Session->>DB: Query PENDING lesson_applications for session
    Session->>Events: Fetch session events (CORRECTION, IMPLICIT_FEEDBACK, RULE_FAILURE)
    DB->>Session: Resolve outcomes (CONFIRMED / REJECTED) via heuristics
    Session->>DB: Update lesson_applications (outcome, success)
    Session->>Brain: Read post-graduation lessons
    Session->>Synth: Synthesize fresh <brain-wisdom> (cache/write)
    Synth-->>Session: Return wisdom block
    Session->>Brain: Write brain_prompt.md (auto-generated header + wisdom)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.59% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main changes: local-first meta-rules port, legacy-gate cleanup, and the graduation bug fix with MISFIRE_PENALTY sign correction.
Description check ✅ Passed The description is well-structured and directly related to the changeset, covering the local-first meta-rules port, the MISFIRE_PENALTY bug fix, test cleanup, and providing specific metrics (3820 passed, 3 skipped, 0 xfailed).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pre-launch-ultra-review

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added bug Something isn't working feature labels Apr 21, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Gradata/tests/conftest.py (1)

39-40: ⚠️ Potential issue | 🟠 Major

Use canonical brain env var in test setup/teardown.

init_brain() and the autouse cleanup currently only manage BRAIN_DIR. If runtime resolution now prefers GRADATA_BRAIN, tests can resolve a different brain path than intended and become order-dependent.

Suggested fix
 def init_brain(
@@
-    os.environ["BRAIN_DIR"] = str(brain_dir)
+    os.environ["GRADATA_BRAIN"] = str(brain_dir)
+    # Keep legacy alias for modules still reading BRAIN_DIR.
+    os.environ["BRAIN_DIR"] = str(brain_dir)
@@
 `@pytest.fixture`(autouse=True)
 def _isolate_brain_dir_env():
@@
-    prev = os.environ.get("BRAIN_DIR")
+    prev_gradata_brain = os.environ.get("GRADATA_BRAIN")
+    prev_brain_dir = os.environ.get("BRAIN_DIR")
     yield
-    if prev is None:
-        os.environ.pop("BRAIN_DIR", None)
+    if prev_gradata_brain is None:
+        os.environ.pop("GRADATA_BRAIN", None)
     else:
-        os.environ["BRAIN_DIR"] = prev
+        os.environ["GRADATA_BRAIN"] = prev_gradata_brain
+    if prev_brain_dir is None:
+        os.environ.pop("BRAIN_DIR", None)
+    else:
+        os.environ["BRAIN_DIR"] = prev_brain_dir

Based on learnings: cli.py uses env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), so test fixtures should align with that canonical key.

Also applies to: 111-116

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/tests/conftest.py` around lines 39 - 40, The tests set and cleanup
the non-canonical env var BRAIN_DIR which can cause order-dependent resolution;
update the fixture and autouse teardown in conftest.py so they set and restore
GRADATA_BRAIN instead of BRAIN_DIR (ensure the same changes are applied where
BRAIN_DIR is manipulated around Brain.init and init_brain()), i.e., export
GRADATA_BRAIN = str(brain_dir) before calling Brain.init() / init_brain() and
clear/restore GRADATA_BRAIN in the teardown to match cli.py's env-first
resolution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Gradata/docs/cloud/dashboard.md`:
- Line 3: The page currently contradicts itself: the earlier sentence says
meta-rule synthesis runs locally but the "Brain detail" bullet still refers to
“cloud-synthesized principles”; update that bullet to use the local-synthesis
wording (e.g., replace “cloud-synthesized principles” with “locally synthesized
principles” or “principles rendered from local synthesis”) so it aligns with the
statement about brain.manifest.json and local synthesis.

In `@Gradata/docs/cloud/overview.md`:
- Line 17: The Cloud column for the table row with "Meta-rule **synthesis**
(local LLM via your own key or Claude Code Max OAuth)" is ambiguous because it
currently shows "Yes"; update that cell to a clarifying phrase such as "Mirrors
local results" (or "Mirrors local output") so it clearly indicates the cloud
column only reflects results from local synthesis rather than performing
cloud-side synthesis.

In `@Gradata/docs/concepts/meta-rules.md`:
- Around line 47-50: The admonition starting with !!! info "Local by default"
currently has the second paragraph indented as a code block which triggers
markdownlint MD046; edit the admonition body so the second paragraph is not
indented (remove the leading spaces or tabs before "Cloud becomes relevant...")
and ensure there's a single blank line between paragraphs so both remain normal
admonition text under the !!! info "Local by default" block.

In `@Gradata/docs/LEGACY_CLEANUP.md`:
- Around line 50-52: The guidance should avoid deleting tests wholesale;
instead, when you grep for the legacy symbol/docstring and remove the old code
path, update any tests that referenced that path (e.g., legacy-path tests) to
assert the new local-first behavior rather than deleting them, and then update
the module docstring to reflect the new behavior; specifically, find tests
exercising the legacy path, replace their assertions with ones that validate
local-first semantics, remove only obsolete scaffolding, and update the
module-level docstring to describe the new expected behavior and why the legacy
path was removed.

In `@Gradata/src/gradata/_cloud_sync.py`:
- Around line 329-333: The current _post function recursively calls itself for
batches larger than _POST_BATCH_SIZE which risks hitting Python's recursion
limit; refactor _post to iterate over rows in chunks of _POST_BATCH_SIZE and
call a non-recursive helper (e.g., _post_single_batch) for each chunk, summing
returns into total and returning total, or inline the existing single-batch
logic into the loop body so no self-call to _post occurs; update references to
table, rows and _POST_BATCH_SIZE and ensure _post_single_batch (or the inlined
logic) returns the per-batch count so the totals accumulate correctly.

In `@Gradata/src/gradata/_doctor.py`:
- Around line 335-343: Replace the try/except/pass used to decode the HTTPError
body with contextlib.suppress for clarity: import contextlib and in the except
urllib.error.HTTPError as e handler use "with contextlib.suppress(Exception):"
around the body = e.read(512).decode("utf-8", errors="replace") statement so
that any decode/read errors are ignored but the function still returns (e.code,
body). Keep the surrounding except (urllib.error.URLError, OSError) handler
unchanged.
- Around line 305-320: The host/port parsing in _check_cloud_reachable is wrong
for URLs with explicit ports: parse api_url with urllib.parse.urlparse inside
_check_cloud_reachable to get parsed.hostname and parsed.port, determine port =
parsed.port or (443 if parsed.scheme == "https" else 80), use
socket.create_connection((hostname, port), timeout=_CLOUD_PROBE_TIMEOUT), and
update the returned "detail" strings to include the actual hostname and port
rather than hardcoding 443 so URLs like http://localhost:8080/api/v1 are probed
correctly.

In `@Gradata/src/gradata/cli.py`:
- Around line 225-231: The --cloud / --no-cloud flags are not mutually exclusive
so args can have conflicting values; update the CLI arg parsing to prevent this
by defining these flags in an argparse mutually exclusive group (use
parser.add_mutually_exclusive_group()) and add the two flags to that group
(e.g., group.add_argument("--cloud", action="store_true", dest="cloud") and
group.add_argument("--no-cloud", action="store_true", dest="no_cloud") or
similar) so they cannot be passed together, and keep the rest of the code using
getattr(args, "cloud", False) and getattr(args, "no_cloud", False) unchanged;
alternatively, if you prefer runtime validation, add a simple check after
parse_args that raises parser.error or prints a clear message when both
args.cloud and args.no_cloud are True and refuse to continue (affecting how
diagnose() is called with cloud_only and include_cloud).

In `@Gradata/src/gradata/enhancements/meta_rules.py`:
- Around line 364-369: The call to _cluster_by_similarity in the loop uses a
hardcoded 0.20 threshold while the function's default is 0.35 (inconsistent
behavior); change this by extracting the thresholds into named constants (e.g.
DEFAULT_SIMILARITY_THRESHOLD = 0.35 and SUBGROUP_SIMILARITY_THRESHOLD = 0.20)
and use the SUBGROUP_SIMILARITY_THRESHOLD constant in the call inside merge
logic (where min_group_size and merge_into_meta are used), and add a short
comment/docstring either above the constants or on _cluster_by_similarity
explaining why different thresholds are used so intent is clear.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py`:
- Around line 220-234: The SDK response handling in rule_synthesizer.py assumes
msg.content[0].text exists; update the block after client.messages.create (the
variable msg and the assignment to raw) to defensively check that msg and
getattr(msg, "content", None) are present and that len(msg.content) > 0 and the
first item has a non-empty .text attribute before accessing it; if the content
is missing/empty, log a debug/error indicating empty SDK response and raise or
set raw to an empty string (or raise to trigger the existing except branch so
the CLI fallback runs), ensuring client.messages.create usage and the variable
names msg, raw, and provider_used remain consistent.

In `@Gradata/src/gradata/hooks/implicit_feedback.py`:
- Around line 31-36: In Gradata/src/gradata/hooks/implicit_feedback.py you have
duplicated regexes for negations (both apostrophed and apostrophe-less forms) —
remove the redundant apostrophe-less entries (the re.compile(r"\bdont\b", re.I),
re.compile(r"\bcant\b", re.I), re.compile(r"\bshouldnt\b", re.I)) so only the
apostrophed patterns (re.compile(r"\bdon'?t\b", re.I), re.compile(r"\bcan'?t\b",
re.I), re.compile(r"\bshouldn'?t\b", re.I)) remain; update the list where these
compiled patterns are defined to eliminate the duplicates and keep intent clear.
- Line 49: The standalone pattern re.compile(r"\bagain\.?\.?\b", re.I) in
implicit_feedback.py is too broad and should be tightened so casual retry
requests (e.g., "run that again") don't trigger reminder/learning events;
replace it with a regex that requires "again" to co-occur with reminder
verbs/phrases (e.g., match constructs like "remind me .* again", "don't forget
.* again", or "again .* remember/please") or remove the standalone "again" rule
entirely; update the pattern used where re.compile(r"\bagain\.?\.?\b", re.I)
appears to a combined expression that enforces surrounding reminder context
(e.g., require "remind", "remember", "don't forget", or "please" near "again")
so only true reminders are captured.

In `@Gradata/src/gradata/hooks/inject_brain_rules.py`:
- Around line 607-615: The fast-path that returns the persistent brain prompt
(bp_text from _read_brain_prompt(Path(brain_dir))) must run before any
manifest/audit side-effects; move the bp_text check and immediate return to the
top of the inject_brain_rules hook so it executes before the code that writes
.last_injection.json and before the code that inserts lesson_applications
PENDING rows for rules_block selections. Locate the bp_text =
_read_brain_prompt(...) block and place the if bp_text: return {"result":
bp_text} before all manifest/audit write calls to ensure no writes occur when
the persistent brain-prompt path is used.
- Around line 529-535: The mandatory rule lines are emitted raw into the
<mandatory-directives> block (built as mandatory_lines using lesson.category and
lesson.description), allowing injected markup; update creation of
mandatory_lines to sanitize both lesson.category and lesson.description via the
existing sanitize_lesson_content(..., "xml") helper (the same way regular
rule/cluster paths do) before formatting the "[MANDATORY] {category}:
{description}" strings so any XML-like characters are escaped.
- Around line 100-120: In _lesson_to_rule_dict, missing or None
sessions_since_fire is being coerced to 0 which sets
last_session=current_session (giving unknown lessons max recency); change the
logic to treat absent/None sessions_since_fire as unknown and set last_session=0
(neutral) instead of deriving current_session - sessions_since. For the object
branch (attribute access) locate sessions_since = int(getattr(...)) and replace
with a check that preserves None/absence (e.g. raw = getattr(lesson,
"sessions_since_fire", None); if raw is None: last_session = 0 else validate raw
>= 0 and compute last_session = max(0, current_session - int(raw)) only when
current_session>0). Do the same in the dict branch (inspect
d.get("sessions_since_fire") and compute/assign d["last_session"] accordingly)
so dict inputs also get the recency fix.

In `@Gradata/src/gradata/hooks/session_close.py`:
- Around line 305-309: The description-based rejection loop that currently
checks if desc[:30] is in lesson_desc (iterating rejecting_descriptions and
setting outcome = "REJECTED") is too permissive; replace that 30-char prefix
substring check with a stronger similarity test such as using
difflib.SequenceMatcher to compute a similarity ratio between desc and
lesson_desc and require a high threshold (e.g., ratio > 0.8), or require a much
longer contiguous match (e.g., common substring length >= 100) before setting
outcome = "REJECTED"; update the loop that references rejecting_descriptions,
desc, lesson_desc and outcome to use the chosen similarity check.

In `@Gradata/tests/test_implicit_feedback.py`:
- Around line 91-96: Add neutral-precision unit tests in TestEdgeCases to ensure
the _detect_signals function does not falsely emit reminder/negation signals for
benign uses of the tokens "again" and "never": add assertions calling
_detect_signals with neutral sentences that include "again" (e.g.,
conversational/temporal uses) and "never" (e.g., idiomatic/benign phrases) and
assert the result is an empty list so reminder/negation signals are not
produced.

---

Outside diff comments:
In `@Gradata/tests/conftest.py`:
- Around line 39-40: The tests set and cleanup the non-canonical env var
BRAIN_DIR which can cause order-dependent resolution; update the fixture and
autouse teardown in conftest.py so they set and restore GRADATA_BRAIN instead of
BRAIN_DIR (ensure the same changes are applied where BRAIN_DIR is manipulated
around Brain.init and init_brain()), i.e., export GRADATA_BRAIN = str(brain_dir)
before calling Brain.init() / init_brain() and clear/restore GRADATA_BRAIN in
the teardown to match cli.py's env-first resolution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: faa0b3cb-e20d-4927-afce-e250bfaa1abe

📥 Commits

Reviewing files that changed from the base of the PR and between 5ef1a1e and 03ddb6f.

📒 Files selected for processing (34)
  • .gitignore
  • Gradata/docs/LEGACY_CLEANUP.md
  • Gradata/docs/architecture/cloud-monolith-v2.md
  • Gradata/docs/architecture/multi-tenant-future-proofing.md
  • Gradata/docs/cloud/dashboard.md
  • Gradata/docs/cloud/overview.md
  • Gradata/docs/concepts/meta-rules.md
  • Gradata/src/gradata/_cloud_sync.py
  • Gradata/src/gradata/_doctor.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/cloud/client.py
  • Gradata/src/gradata/enhancements/graduation/agent_graduation.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/enhancements/rule_pipeline.py
  • Gradata/src/gradata/enhancements/rule_synthesizer.py
  • Gradata/src/gradata/hooks/implicit_feedback.py
  • Gradata/src/gradata/hooks/inject_brain_rules.py
  • Gradata/src/gradata/hooks/session_close.py
  • Gradata/tests/conftest.py
  • Gradata/tests/test_agent_graduation.py
  • Gradata/tests/test_bug_fixes.py
  • Gradata/tests/test_cloud_row_push.py
  • Gradata/tests/test_doctor_cloud.py
  • Gradata/tests/test_implicit_feedback.py
  • Gradata/tests/test_integration_workflow.py
  • Gradata/tests/test_lesson_applications.py
  • Gradata/tests/test_llm_synthesizer.py
  • Gradata/tests/test_mem0_adapter.py
  • Gradata/tests/test_meta_rule_generalization.py
  • Gradata/tests/test_meta_rules.py
  • Gradata/tests/test_multi_brain_simulation.py
  • Gradata/tests/test_pipeline_e2e.py
  • Gradata/tests/test_rule_pipeline.py
  • Gradata/tests/test_rule_synthesizer.py
💤 Files with no reviewable changes (2)
  • Gradata/tests/test_bug_fixes.py
  • Gradata/tests/test_multi_brain_simulation.py
📜 Review details
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

Applied to files:

  • Gradata/tests/test_cloud_row_push.py
  • Gradata/docs/architecture/multi-tenant-future-proofing.md
  • Gradata/tests/conftest.py
  • Gradata/tests/test_integration_workflow.py
  • Gradata/src/gradata/cli.py
  • Gradata/tests/test_agent_graduation.py
  • Gradata/tests/test_doctor_cloud.py
  • Gradata/tests/test_pipeline_e2e.py
  • Gradata/tests/test_lesson_applications.py
  • Gradata/src/gradata/hooks/session_close.py
  • Gradata/src/gradata/_cloud_sync.py
🪛 LanguageTool
Gradata/docs/cloud/overview.md

[style] ~3-~3: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...uity, team sharing, and managed backups on top of that local loop. ## What's in the SDK vs th...

(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)

🪛 markdownlint-cli2 (0.22.0)
Gradata/docs/concepts/meta-rules.md

[warning] 50-50: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🪛 Ruff (0.15.10)
Gradata/src/gradata/_doctor.py

[warning] 337-340: Use contextlib.suppress(Exception) instead of try-except-pass

Replace try-except-pass with with contextlib.suppress(Exception): ...

(SIM105)

🔇 Additional comments (40)
.gitignore (1)

138-138: Good privacy guard addition.

Adding Gradata/docs/pre-launch-plan.md to ignore rules is consistent with the private-draft protections in this block and aligns with the PR’s redaction intent.

Gradata/tests/test_mem0_adapter.py (5)

3-3: Docstring update is accurate and helpful.

This clearly documents the offline-only testing contract for this module.


89-89: Compact fake-client setup remains clear and correct.

No behavioral change here; the test still validates results envelope ID extraction as intended.


236-236: Bare-list response test still reads well after compaction.

The normalization path for minimal search payloads remains explicitly covered.


249-249: Old-SDK fallback scenario is preserved correctly.

This still exercises the retry path when filters is unsupported.


269-269: Log assertion refactor is equivalent and clean.

The failure-path warning check remains intact.

Gradata/tests/test_integration_workflow.py (2)

1-5: Good stale-gate cleanup and accurate local-pipeline framing.

The updated docstring correctly reflects this test module’s intent after removing API-key coupling for these workflow checks.


12-12: Module-level integration marker is correctly simplified.

Keeping pytestmark as integration-only here is consistent with the local-first test behavior and avoids unnecessary environment-based skips.

Gradata/src/gradata/cloud/client.py (1)

68-75: _post(...) callsite refactor looks clean

These are formatting-only callsite changes; request paths and payload shapes remain intact, so behavior is preserved.

Also applies to: 132-138

Gradata/src/gradata/hooks/implicit_feedback.py (1)

19-23: Good expansion of shorthand correction/challenge coverage.

These additions improve recall for real-world text-speak without changing core control flow.

Also applies to: 58-63

Gradata/tests/test_implicit_feedback.py (1)

23-84: Nice coverage of new shorthand/reminder/challenge variants.

This materially improves regression safety for the regex expansion.

Gradata/tests/test_meta_rule_generalization.py (1)

63-82: LGTM — test unskipped and updated for local meta-rule discovery.

The test now exercises discover_meta_rules() locally with more descriptive lesson content. The updated descriptions provide clearer semantic signals for the clustering algorithm to detect patterns.

Gradata/src/gradata/enhancements/graduation/agent_graduation.py (2)

536-537: Correct fix for the double-negative bug.

The previous code lesson.confidence - MISFIRE_PENALTY with MISFIRE_PENALTY = -0.15 resulted in confidence - (-0.15) = confidence + 0.15, incorrectly increasing confidence on rejection. The fix to + MISFIRE_PENALTY now correctly decreases confidence, aligning with the canonical usage in _confidence.py.


76-80: LGTM — formatting-only changes.

These constant definitions and dataclass field comments are reformatted for readability with no semantic changes.

Also applies to: 93-95, 132-134

Gradata/src/gradata/enhancements/meta_rules.py (4)

273-296: Greedy clustering algorithm looks correct.

The single-pass greedy clustering approach is appropriate for the expected scale (tens of lessons). The centroid-based similarity comparison with threshold gating is a standard pattern.


344-354: Zombie filter implementation matches PR specification.

The filter correctly excludes RULE-state lessons with confidence below 0.90 while allowing PATTERN-state lessons at any confidence. The noise pattern filter also helps exclude auto-generated descriptions that aren't real user corrections.


405-409: Confidence smoothing formula is correct.

The count / (count + 3.0) formula implements standard Bayesian smoothing (equivalent to Beta(count, 3) posterior mean with uniform prior). This prevents small clusters from having unrealistically high confidence.


1217-1230: Good visibility for deterministic fallback limitation.

The warning clearly explains the capability gap when LLM credentials are missing, noting that deterministic meta-rules will be stored but not injected. This prevents silent 100% discard of synthesized rules.

Gradata/src/gradata/enhancements/rule_pipeline.py (2)

434-443: Good refactor to canonical graduation.

The change to call _graduate(all_lessons) instead of inline threshold conditionals ensures consistency with the authoritative graduation logic. The state-change detection gate (pre_states.get(id(lesson)) != lesson.state) correctly records only lessons that actually transitioned to PATTERN or RULE.


483-483: Correct import for RULE_THRESHOLD.

Importing RULE_THRESHOLD from the canonical _confidence module ensures the hook promotion gate uses the same constant as the graduation logic.

Gradata/tests/test_rule_pipeline.py (2)

109-129: Test correctly updated for H1 strict threshold semantics.

Using confidence=0.65 (above the 0.60 threshold) ensures the lesson can graduate under the strict > comparison. The docstring accurately explains that lessons must earn at least one bonus to clear the threshold.


132-153: Excellent boundary test for H1 fix.

This test validates that a lesson at exactly PATTERN_THRESHOLD (0.60) does NOT graduate, enforcing the strict > semantics that prevent "promotion from spawn." The assertions correctly verify that INSTINCT remains and PATTERN is not written.

Gradata/src/gradata/_cloud_sync.py (3)

97-115: Robust sanitization for Postgres JSONB compatibility.

The _scrub() function correctly handles NUL bytes (which Postgres JSONB rejects) and unpaired UTF-16 surrogates. The recursive approach ensures nested structures are fully sanitized.


118-157: Well-structured row transformation for cloud schema.

The _transform_row() function correctly maps local SQLite columns to the narrower cloud schema with:

  • Deterministic UUIDs from local integer IDs
  • tenant_idbrain_id rename
  • Session type coercion with fallback to data.session_raw
  • JSON parsing with safe defaults

411-422: Good error isolation for malformed rows.

Catching transform exceptions per-row and logging warnings ensures one bad row doesn't block the entire batch. Setting all_ok = False correctly prevents watermark advancement when rows are skipped.

Gradata/tests/test_cloud_row_push.py (1)

67-67: Correct test update for transformed row shape.

The assertion change from tenant_id to brain_id correctly reflects the _transform_row() output for the events table, which maps tenant_id to brain_id for the cloud schema.

Gradata/docs/architecture/cloud-monolith-v2.md (1)

8-12: Documentation accurately reflects the local-first architecture.

The updated description clearly articulates the responsibility split: local SQLite is the source of truth for graduation/synthesis/rule-to-hook, while the cloud serves as a downstream reflection layer for visualization, sharing, and backups. This aligns with the PR's "local by default" behavioral engine model.

Gradata/tests/test_meta_rules.py (1)

85-87: Confidence expectation now correctly matches implementation.

This assertion aligns with merge_into_meta() smoothing (count / (count + 3.0)) and protects against regressions in meta confidence computation.

Gradata/tests/test_llm_synthesizer.py (1)

123-140: Good deterministic coverage for merge_into_meta().

This test clearly locks in the local deterministic contract (meta.principle present, META- ID format) while keeping LLM synthesis concerns separate.

Gradata/docs/architecture/multi-tenant-future-proofing.md (1)

16-17: Local-first/cloud-boundary wording is clear and consistent.

These updates remove cloud-gate ambiguity and keep the execution model aligned across the architecture narrative and verification flow.

Also applies to: 22-23, 39-42, 119-122

Gradata/tests/test_lesson_applications.py (1)

1-147: LGTM!

This test file provides comprehensive coverage for the lesson_applications audit trail:

  • Verifies PENDING rows are written during injection
  • Confirms session_close resolves PENDING→CONFIRMED when no category-matching correction exists
  • Confirms PENDING→REJECTED when a CORRECTION shares the lesson's category
  • Validates IMPLICIT_FEEDBACK events also trigger rejection
  • Ensures graceful degradation when system.db is absent
Gradata/tests/test_rule_synthesizer.py (1)

1-118: LGTM!

The test file thoroughly validates the fail-safe contracts documented in rule_synthesizer.py:

  • Returns None when both providers are unavailable
  • Short-circuits on empty inputs without touching providers
  • Returns cached content on cache hit
  • Falls back to CLI when SDK initialization fails
  • Returns None (not raises) on malformed output
Gradata/tests/test_agent_graduation.py (1)

121-141: LGTM! The xfail removals align with the graduation bug fix.

The PR description mentions fixing MISFIRE_PENALTY = -0.15 which was "inadvertently subtracted (causing double-negative behavior)". The relevant code snippet shows the fix now uses max(0.0, lesson.confidence + MISFIRE_PENALTY), correctly decreasing confidence.

The flexible assertion l.state in (LessonState.PATTERN, LessonState.RULE) is appropriate since the final state depends on whether thresholds for RULE promotion are also met.

Gradata/tests/test_pipeline_e2e.py (2)

110-150: LGTM! Cloud-gate removal and confidence assertions are correct.

The test now directly exercises the local-first discover_meta_rules without cloud gating. The confidence assertion >= 0.5 is mathematically guaranteed by the count/(count+3) formula in merge_into_meta: with 3+ lessons, confidence ≥ 3/6 = 0.5.


285-317: Good substitution of auto-generated descriptions with original correction text.

The comment explains why this is necessary: auto-generated edit-distance descriptions may be filtered by the meta-synthesis noise filter. Using the original final text from corrections mirrors what graduation's LLM principle distillation does in production.

Gradata/src/gradata/enhancements/rule_synthesizer.py (3)

1-24: Well-documented design contracts.

The module docstring clearly articulates the fail-safe contract, two-provider fallback strategy, and caching behavior. This is excellent documentation for maintainability.


270-281: No action needed. The --output-format text flag is a valid Claude CLI option (the default format for print mode) and is correctly used with the -p flag in this invocation.


37-40: No action needed—claude-opus-4-7 is a valid Anthropic model.

The model name is confirmed as valid and currently available through the Anthropic API as of April 2026. It's the flagship model for complex tasks and compatible with both SDK and CLI implementations.

Gradata/src/gradata/hooks/session_close.py (1)

219-231: No issue found — the code design is intentional.

The _read_brain_prompt() function is explicitly designed to add the <brain-wisdom> wrapper tags on read if they are not present (see lines 95–96 in inject_brain_rules.py). Stripping the tags in session_close.py before writing (lines 222–225) and storing the unwrapped content (line 231) is the correct approach; the tags will be added back when the file is read by inject_brain_rules.py.

Gradata/tests/test_doctor_cloud.py (1)

15-127: Good deterministic coverage for the offline cloud helpers.

The temp-config fixture plus _probe_api patching keeps the config/auth/data branches isolated from the developer machine and real network calls.

# Dashboard

The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It wraps the same data the local `brain.manifest.json` exposes, plus Cloud-only views for meta-rule synthesis, team management, and the operator console.
The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It visualizes the same data the local `brain.manifest.json` exposes, plus Cloud-only views for team management and the operator console. Meta-rule synthesis runs locally in the SDK — the dashboard renders the results, it does not re-run them.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve meta-rule source contradiction in this page.

Line 3 correctly states synthesis is local, but the Brain detail bullet still says “cloud-synthesized principles” (Line 26). Please align terminology to “locally synthesized” or “rendered from local synthesis.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/cloud/dashboard.md` at line 3, The page currently contradicts
itself: the earlier sentence says meta-rule synthesis runs locally but the
"Brain detail" bullet still refers to “cloud-synthesized principles”; update
that bullet to use the local-synthesis wording (e.g., replace “cloud-synthesized
principles” with “locally synthesized principles” or “principles rendered from
local synthesis”) so it aligns with the statement about brain.manifest.json and
local synthesis.

| Cross-platform export (`.cursorrules`, `BRAIN-RULES.md`, ...) | Yes | Yes |
| Meta-rule **clustering** | Yes | Yes |
| Meta-rule **synthesis** (LLM-generated principles) | Placeholder | Yes |
| Meta-rule **synthesis** (local LLM via your own key or Claude Code Max OAuth) | Yes | Yes |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify Cloud column for local-only synthesis row.

Because the row explicitly says synthesis is local LLM, marking Cloud as “Yes” is ambiguous and can imply cloud-side synthesis. Consider “Mirrors local results” (or similar) for the Cloud column.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/cloud/overview.md` at line 17, The Cloud column for the table
row with "Meta-rule **synthesis** (local LLM via your own key or Claude Code Max
OAuth)" is ambiguous because it currently shows "Yes"; update that cell to a
clarifying phrase such as "Mirrors local results" (or "Mirrors local output") so
it clearly indicates the cloud column only reflects results from local synthesis
rather than performing cloud-side synthesis.

Comment on lines +47 to +50
!!! info "Local by default"
Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.

The math, the events, and the storage are all open. Only the LLM-driven synthesis that turns `[rule, rule, rule] → "Verify before acting"` is cloud-gated.
Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix admonition body formatting to avoid MD046.

The second paragraph in this !!! info block is currently parsed as an indented code block by markdownlint. Keep the body as continuous admonition text (no code-style indentation break).

Suggested fix
 !!! info "Local by default"
     Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.
-
     Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
!!! info "Local by default"
Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.
The math, the events, and the storage are all open. Only the LLM-driven synthesis that turns `[rule, rule, rule] → "Verify before acting"` is cloud-gated.
Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally.
!!! info "Local by default"
Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.
Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 50-50: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/concepts/meta-rules.md` around lines 47 - 50, The admonition
starting with !!! info "Local by default" currently has the second paragraph
indented as a code block which triggers markdownlint MD046; edit the admonition
body so the second paragraph is not indented (remove the leading spaces or tabs
before "Cloud becomes relevant...") and ensure there's a single blank line
between paragraphs so both remain normal admonition text under the !!! info
"Local by default" block.

Comment on lines +50 to +52
1. Grep for the symbol / doc string.
2. Delete the code path and any tests that exercise it.
3. Update the module docstring.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Prefer replacing legacy-path tests over deleting them outright.

Line 52 risks shrinking coverage. Safer guidance: remove obsolete tests, but keep/replace assertions for the new local-first behavior.

Suggested wording
-2. Delete the code path and any tests that exercise it.
+2. Delete the code path and remove/replace tests that target only the legacy path.
+   Keep equivalent coverage for the local-first replacement behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/LEGACY_CLEANUP.md` around lines 50 - 52, The guidance should
avoid deleting tests wholesale; instead, when you grep for the legacy
symbol/docstring and remove the old code path, update any tests that referenced
that path (e.g., legacy-path tests) to assert the new local-first behavior
rather than deleting them, and then update the module docstring to reflect the
new behavior; specifically, find tests exercising the legacy path, replace their
assertions with ones that validate local-first semantics, remove only obsolete
scaffolding, and update the module-level docstring to describe the new expected
behavior and why the legacy path was removed.

Comment on lines +329 to +333
if len(rows) > _POST_BATCH_SIZE:
total = 0
for i in range(0, len(rows), _POST_BATCH_SIZE):
total += _post(table, rows[i : i + _POST_BATCH_SIZE])
return total

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider iteration instead of recursion for batch splitting.

The recursive _post() call for large batches could theoretically hit Python's recursion limit with extremely large datasets (>500k rows). For robustness, consider refactoring to iteration:

♻️ Iterative alternative
     if len(rows) > _POST_BATCH_SIZE:
-        total = 0
-        for i in range(0, len(rows), _POST_BATCH_SIZE):
-            total += _post(table, rows[i : i + _POST_BATCH_SIZE])
-        return total
+        return sum(
+            _post_single_batch(table, rows[i : i + _POST_BATCH_SIZE])
+            for i in range(0, len(rows), _POST_BATCH_SIZE)
+        )

Where _post_single_batch is the non-recursive core logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_cloud_sync.py` around lines 329 - 333, The current _post
function recursively calls itself for batches larger than _POST_BATCH_SIZE which
risks hitting Python's recursion limit; refactor _post to iterate over rows in
chunks of _POST_BATCH_SIZE and call a non-recursive helper (e.g.,
_post_single_batch) for each chunk, summing returns into total and returning
total, or inline the existing single-batch logic into the loop body so no
self-call to _post occurs; update references to table, rows and _POST_BATCH_SIZE
and ensure _post_single_batch (or the inlined logic) returns the per-batch count
so the totals accumulate correctly.

Comment on lines +100 to +120
def _lesson_to_rule_dict(lesson, current_session: int = 0) -> dict:
"""Flatten a Lesson object (or dict) into the shape rank_rules expects.

Carries Beta posterior fields (alpha / beta_param) through so Thompson
sampling works when ``GRADATA_THOMPSON_RANKING=1``.

``last_session`` is derived as ``current_session - sessions_since_fire``
when both are known — rule_ranker._recency_score expects absolute session
numbers, and before this we were hard-coding 0 which killed the recency
component of the ranker entirely. Falls back to 0 (neutral) when the
caller doesn't pass current_session or sessions_since_fire is unset.
"""
if isinstance(lesson, dict):
return dict(lesson)
d = dict(lesson)
d.setdefault("last_session", 0)
return d
sessions_since = int(getattr(lesson, "sessions_since_fire", 0) or 0)
if current_session > 0 and sessions_since >= 0:
last_session = max(0, current_session - sessions_since)
else:
last_session = 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't give unknown lessons max recency.

Line 116 collapses missing/None sessions_since_fire to 0, so lessons with no recency data get last_session=current_session instead of the neutral 0 described in the docstring. The dict branch also returns before deriving last_session, so back-compat dict inputs still miss the recency fix entirely. That will skew rank_rules() toward partially populated rows.

Suggested fix
 def _lesson_to_rule_dict(lesson, current_session: int = 0) -> dict:
+    def _derive_last_session(raw_sessions_since: object) -> int:
+        try:
+            sessions_since = int(raw_sessions_since)
+        except (TypeError, ValueError):
+            return 0
+        if current_session <= 0 or sessions_since < 0:
+            return 0
+        return max(0, current_session - sessions_since)
+
     if isinstance(lesson, dict):
         d = dict(lesson)
-        d.setdefault("last_session", 0)
+        d.setdefault("last_session", _derive_last_session(d.get("sessions_since_fire")))
         return d
-    sessions_since = int(getattr(lesson, "sessions_since_fire", 0) or 0)
-    if current_session > 0 and sessions_since >= 0:
-        last_session = max(0, current_session - sessions_since)
-    else:
-        last_session = 0
+    last_session = _derive_last_session(getattr(lesson, "sessions_since_fire", None))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/hooks/inject_brain_rules.py` around lines 100 - 120, In
_lesson_to_rule_dict, missing or None sessions_since_fire is being coerced to 0
which sets last_session=current_session (giving unknown lessons max recency);
change the logic to treat absent/None sessions_since_fire as unknown and set
last_session=0 (neutral) instead of deriving current_session - sessions_since.
For the object branch (attribute access) locate sessions_since =
int(getattr(...)) and replace with a check that preserves None/absence (e.g. raw
= getattr(lesson, "sessions_since_fire", None); if raw is None: last_session = 0
else validate raw >= 0 and compute last_session = max(0, current_session -
int(raw)) only when current_session>0). Do the same in the dict branch (inspect
d.get("sessions_since_fire") and compute/assign d["last_session"] accordingly)
so dict inputs also get the recency fix.

Comment on lines +529 to +535
lesson
for lesson in all_lessons
if lesson.state.name == "RULE"
and lesson.confidence >= 0.90
and getattr(lesson, "fire_count", 0) >= 10
]
if mandatory:
mandatory_lines = [
f"[MANDATORY] {r.category}: {r.description}" for r in mandatory
]
mandatory_lines: list[str] = [f"[MANDATORY] {r.category}: {r.description}" for r in mandatory]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escape mandatory rule text before writing <mandatory-directives>.

This path emits raw category and description into an XML-like block. A stored lesson containing </mandatory-directives> or similar markup can break the prompt structure and inject arbitrary content. Reuse sanitize_lesson_content(..., "xml") here, like the regular rule and cluster paths do.

Suggested fix
-    mandatory_lines: list[str] = [f"[MANDATORY] {r.category}: {r.description}" for r in mandatory]
+    mandatory_lines: list[str] = [
+        "[MANDATORY] "
+        f"{sanitize_lesson_content(r.category, 'xml')}: "
+        f"{sanitize_lesson_content(r.description, 'xml')}"
+        for r in mandatory
+    ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
lesson
for lesson in all_lessons
if lesson.state.name == "RULE"
and lesson.confidence >= 0.90
and getattr(lesson, "fire_count", 0) >= 10
]
if mandatory:
mandatory_lines = [
f"[MANDATORY] {r.category}: {r.description}" for r in mandatory
]
mandatory_lines: list[str] = [f"[MANDATORY] {r.category}: {r.description}" for r in mandatory]
lesson
for lesson in all_lessons
if lesson.state.name == "RULE"
and lesson.confidence >= 0.90
and getattr(lesson, "fire_count", 0) >= 10
]
mandatory_lines: list[str] = [
"[MANDATORY] "
f"{sanitize_lesson_content(r.category, 'xml')}: "
f"{sanitize_lesson_content(r.description, 'xml')}"
for r in mandatory
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/hooks/inject_brain_rules.py` around lines 529 - 535, The
mandatory rule lines are emitted raw into the <mandatory-directives> block
(built as mandatory_lines using lesson.category and lesson.description),
allowing injected markup; update creation of mandatory_lines to sanitize both
lesson.category and lesson.description via the existing
sanitize_lesson_content(..., "xml") helper (the same way regular rule/cluster
paths do) before formatting the "[MANDATORY] {category}: {description}" strings
so any XML-like characters are escaped.

Comment on lines +607 to +615
# Persistent brain-prompt: if brain/brain_prompt.md exists AND was written
# by session_close._refresh_brain_prompt (identified by the AUTO-GENERATED
# header), inject it verbatim and skip the fragmented composition.
# Synthesis never runs in the injection hook — that path was slow (CLI
# round-trip) and non-deterministic. The session_close hook is the only
# place we call the LLM; injection is pure read-compose.
bp_text = _read_brain_prompt(Path(brain_dir))
if bp_text:
return {"result": bp_text}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move the brain_prompt.md fast-path before manifest/audit writes.

When bp_text is present, the hook has already written .last_injection.json and inserted lesson_applications PENDING rows for rules_block selections, even though those rules never reach the model. That corrupts correction attribution and session-close audit resolution on the persistent-brain-prompt path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/hooks/inject_brain_rules.py` around lines 607 - 615, The
fast-path that returns the persistent brain prompt (bp_text from
_read_brain_prompt(Path(brain_dir))) must run before any manifest/audit
side-effects; move the bp_text check and immediate return to the top of the
inject_brain_rules hook so it executes before the code that writes
.last_injection.json and before the code that inserts lesson_applications
PENDING rows for rules_block selections. Locate the bp_text =
_read_brain_prompt(...) block and place the if bp_text: return {"result":
bp_text} before all manifest/audit write calls to ensure no writes occur when
the persistent brain-prompt path is used.

Comment on lines +305 to +309
elif lesson_desc:
for desc in rejecting_descriptions:
if desc and desc[:30] and desc[:30] in lesson_desc:
outcome = "REJECTED"
break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Description-based rejection uses a 30-character prefix match, which may cause false positives.

The prefix match desc[:30] in lesson_desc could incorrectly reject lessons when two unrelated rules happen to share a common 30-character prefix (e.g., both starting with "Always use the standard workflow...").

Consider using a longer prefix or requiring a higher match ratio for description-based rejection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/hooks/session_close.py` around lines 305 - 309, The
description-based rejection loop that currently checks if desc[:30] is in
lesson_desc (iterating rejecting_descriptions and setting outcome = "REJECTED")
is too permissive; replace that 30-char prefix substring check with a stronger
similarity test such as using difflib.SequenceMatcher to compute a similarity
ratio between desc and lesson_desc and require a high threshold (e.g., ratio >
0.8), or require a much longer contiguous match (e.g., common substring length
>= 100) before setting outcome = "REJECTED"; update the loop that references
rejecting_descriptions, desc, lesson_desc and outcome to use the chosen
similarity check.

Comment on lines +91 to +96
class TestEdgeCases:
def test_empty_string_returns_no_signals(self):
assert _detect_signals("") == []

def test_short_unrelated_string(self):
assert _detect_signals("ok") == []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add neutral precision tests for again/never to prevent noisy detections.

Given the broadened patterns, add explicit assertions that neutral prompts containing these tokens do not emit reminder/negation signals.

🧪 Suggested test additions
 class TestEdgeCases:
@@
     def test_short_unrelated_string(self):
         assert _detect_signals("ok") == []
+
+    def test_again_neutral_not_reminder(self):
+        types = _signal_types("Can you run that again with verbose logs?")
+        assert "reminder" not in types
+
+    def test_nevermind_neutral_not_negation(self):
+        types = _signal_types("Never mind, this is fine now.")
+        assert "negation" not in types
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/tests/test_implicit_feedback.py` around lines 91 - 96, Add
neutral-precision unit tests in TestEdgeCases to ensure the _detect_signals
function does not falsely emit reminder/negation signals for benign uses of the
tokens "again" and "never": add assertions calling _detect_signals with neutral
sentences that include "again" (e.g., conversational/temporal uses) and "never"
(e.g., idiomatic/benign phrases) and assert the result is an empty list so
reminder/negation signals are not produced.

CRITICAL fixes:
- C1: rewrite meta_rules.py module docstring. It still said 'require
  Gradata Cloud' / 'no-ops in the open-source build' which directly
  contradicted the local-first implementation in the same file. Now
  describes the real algorithm. Closes LEGACY_CLEANUP item #3.
- C2: drop owner-name string from _NOISE_PATTERNS. The other entries
  are format-based (cut:/added:/content change) and filter just fine.
- C3: generalize the name-prefix strip regex in _build_principle from
  hardcoded 'Oliver:' to a generic 'Name:' pattern.

HIGH fixes:
- H1: update _update_lesson_confidence docstring to stop quoting the
  old -0.25 number and instead point at the canonical constants.
- H2: _apply_decay no longer mutates MetaRule in place — uses
  dataclasses.replace() so refresh_meta_rules' persisted inputs aren't
  silently modified.
- H3: add a comment explaining why the call-site threshold=0.20 is
  intentionally looser than _cluster_by_similarity's 0.35 default
  (category pre-filter handles most noise, recall matters more here).

Suite clean on touched areas.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai coderabbitai Bot added docs and removed bug Something isn't working feature labels Apr 21, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Gradata/src/gradata/enhancements/meta_rules.py (1)

345-396: ⚠️ Potential issue | 🟠 Major

refresh_meta_rules() still bypasses the new local discovery path.

This function now makes discover_meta_rules() the local-first implementation, but refresh_meta_rules() at Line 654 still logs that discovery requires cloud and only returns validated existing_metas. Any caller using the refresh entry point will never see newly discovered local meta-rules. Route refresh through discover_meta_rules() or update the canonical caller to use the new path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 345 - 396,
refresh_meta_rules currently bypasses the new local-first discovery implemented
in discover_meta_rules, so callers of refresh_meta_rules never receive newly
discovered local meta-rules; update refresh_meta_rules to call
discover_meta_rules (with the same args: lessons, min_group_size,
current_session, etc.) when cloud validation is not required or when local
discovery is allowed, then merge or replace existing_metas with the returned
metas as appropriate and update the log message (remove the "requires cloud"
wording) so that refresh_meta_rules returns discovered local metas instead of
only validated existing_metas; reference refresh_meta_rules,
discover_meta_rules, and existing_metas when making the change.
♻️ Duplicate comments (1)
Gradata/docs/LEGACY_CLEANUP.md (1)

49-53: ⚠️ Potential issue | 🟡 Minor

Don’t tell contributors to delete replacement coverage.

Step 2 still instructs people to delete "any tests that exercise" the legacy path. For these migrations, obsolete scaffolding should go away, but assertions should be rewritten to cover the local-first replacement behavior so cleanup does not shrink coverage.

Suggested wording
 1. Grep for the symbol / doc string.
-2. Delete the code path and any tests that exercise it.
+2. Delete the code path and remove or rewrite tests that only exercise the legacy path.
+   Keep equivalent coverage for the local-first replacement behavior.
 3. Update the module docstring.
 4. Bump the deprecation note in `CHANGELOG`.
 5. Run the full suite.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/LEGACY_CLEANUP.md` around lines 49 - 53, Update
LEGACY_CLEANUP.md to stop instructing contributors to delete tests in Step 2;
replace the line "Delete the code path and any tests that exercise it." with
guidance to rewrite or adapt those tests to assert the new local-first
replacement behavior (preserve coverage by mapping old assertions to the new
implementation), and add a short example sentence showing that tests should be
refactored rather than removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Gradata/docs/LEGACY_CLEANUP.md`:
- Around line 16-45: Insert a blank line after each subsection heading in the
LEGACY_CLEANUP.md block (e.g., after "### 1. Deprecated adapter shims (scheduled
v0.8.0)", "### 2. `_cloud_sync.py` terminology", "### 3. ~~Docstring drift in
`meta_rules.py`~~ (fixed in PR `#126`)", "### 4. Test-level cloud gating", "### 5.
`api_key` kwarg on `merge_into_meta`", and "### 6. Doc sweep") so the heading is
followed by an empty line before the paragraph or list; this will satisfy
markdownlint MD022. Ensure every `###` heading in that diff has exactly one
blank line below it.

In `@Gradata/src/gradata/enhancements/meta_rules.py`:
- Around line 291-314: _cluser_by_similarity is non-deterministic because it
picks the first remaining lesson as centroid, causing different clusters (and
thus different META-* IDs) depending on input order; fix by making clustering
order-stable: deterministically sort the incoming lessons (or pick the initial
centroid deterministically) before clustering (e.g., sort by a stable unique
field such as lesson.id or lesson.title/description) so that
_cluster_by_similarity (and downstream discover_meta_rules) produces stable
clusters across runs while leaving the similarity logic and threshold unchanged.

---

Outside diff comments:
In `@Gradata/src/gradata/enhancements/meta_rules.py`:
- Around line 345-396: refresh_meta_rules currently bypasses the new local-first
discovery implemented in discover_meta_rules, so callers of refresh_meta_rules
never receive newly discovered local meta-rules; update refresh_meta_rules to
call discover_meta_rules (with the same args: lessons, min_group_size,
current_session, etc.) when cloud validation is not required or when local
discovery is allowed, then merge or replace existing_metas with the returned
metas as appropriate and update the log message (remove the "requires cloud"
wording) so that refresh_meta_rules returns discovered local metas instead of
only validated existing_metas; reference refresh_meta_rules,
discover_meta_rules, and existing_metas when making the change.

---

Duplicate comments:
In `@Gradata/docs/LEGACY_CLEANUP.md`:
- Around line 49-53: Update LEGACY_CLEANUP.md to stop instructing contributors
to delete tests in Step 2; replace the line "Delete the code path and any tests
that exercise it." with guidance to rewrite or adapt those tests to assert the
new local-first replacement behavior (preserve coverage by mapping old
assertions to the new implementation), and add a short example sentence showing
that tests should be refactored rather than removed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f3ac179c-04ef-479b-b297-c1ba705eb521

📥 Commits

Reviewing files that changed from the base of the PR and between 03ddb6f and 90a993d.

📒 Files selected for processing (3)
  • Gradata/docs/LEGACY_CLEANUP.md
  • Gradata/src/gradata/enhancements/graduation/agent_graduation.py
  • Gradata/src/gradata/enhancements/meta_rules.py
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
🪛 markdownlint-cli2 (0.22.0)
Gradata/docs/LEGACY_CLEANUP.md

[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 31-31: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 36-36: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 43-43: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🔇 Additional comments (1)
Gradata/src/gradata/enhancements/graduation/agent_graduation.py (1)

505-510: Good fix for the double-negative bug.

The previous implementation subtracted MISFIRE_PENALTY (a negative constant), which caused confidence - (-0.10) = confidence + 0.10 — incorrectly increasing confidence on rejection. The corrected logic confidence + MISFIRE_PENALTY now properly decreases confidence when an agent output is rejected, aligning with the canonical behavior defined in _confidence.py.

The updated docstring accurately documents this: + MISFIRE_PENALTY (constant is negative).

Also applies to: 538-539

Comment on lines +16 to +45
### 1. Deprecated adapter shims (scheduled v0.8.0)
- `src/gradata/integrations/anthropic_adapter.py` → `middleware.wrap_anthropic`
- `src/gradata/integrations/langchain_adapter.py` → `middleware.LangChainCallback`
- `src/gradata/integrations/crewai_adapter.py` → `middleware.CrewAIGuard`
Warnings are in place; remove the modules and their tests at v0.8.0.

### 2. `_cloud_sync.py` terminology
File posts to an optional external dashboard — fine to keep, but the
module docstring should make clear it is optional telemetry, not a
mandatory cloud dependency. Callers already tolerate absence.

### 3. ~~Docstring drift in `meta_rules.py`~~ (fixed in PR #126)
Module header now describes the local clustering algorithm and points
at `rule_synthesizer` for LLM-assisted distillation. Closed.

### 4. Test-level cloud gating
Former `@_requires_cloud` / `skipif` markers were deleted in this cycle.
If any new test reintroduces a cloud gate, delete the gate instead — the
feature should either be local-first or not ship.

### 5. `api_key` kwarg on `merge_into_meta`
The old `merge_into_meta(..., api_key=...)` path routed into
`synthesise_principle_llm` directly. Current architecture drives LLM
distillation from `rule_synthesizer` at session close instead. The kwarg
is still accepted via `**kwargs` for forward compatibility but performs
no work — remove after one release.

### 6. Doc sweep
`docs/cloud/` should be audited for pages that imply cloud is required.
Rewrite as "optional managed hosting" or delete.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add blank lines after the subsection headings.

markdownlint MD022 is firing on each ### section in this block, so the new doc will fail the docs lint as written. Insert a blank line between each heading and the paragraph/list that follows it.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 31-31: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 36-36: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 43-43: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/LEGACY_CLEANUP.md` around lines 16 - 45, Insert a blank line
after each subsection heading in the LEGACY_CLEANUP.md block (e.g., after "###
1. Deprecated adapter shims (scheduled v0.8.0)", "### 2. `_cloud_sync.py`
terminology", "### 3. ~~Docstring drift in `meta_rules.py`~~ (fixed in PR
`#126`)", "### 4. Test-level cloud gating", "### 5. `api_key` kwarg on
`merge_into_meta`", and "### 6. Doc sweep") so the heading is followed by an
empty line before the paragraph or list; this will satisfy markdownlint MD022.
Ensure every `###` heading in that diff has exactly one blank line below it.

Comment on lines +291 to +314
def _cluster_by_similarity(
lessons: list[Lesson],
threshold: float = 0.35,
) -> list[list[Lesson]]:
"""Greedy single-pass clustering by semantic similarity.

Picks the first unclustered lesson as centroid, pulls in anything above
``threshold``, repeats on the remainder. Good enough for the cluster
sizes we see (tens of lessons, not thousands).
"""
unclustered = list(lessons)
clusters: list[list[Lesson]] = []
while unclustered:
centroid = unclustered.pop(0)
cluster = [centroid]
remaining: list[Lesson] = []
for lesson in unclustered:
if semantic_similarity(centroid.description, lesson.description) >= threshold:
cluster.append(lesson)
else:
remaining.append(lesson)
clusters.append(cluster)
unclustered = remaining
return clusters

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make clustering order-stable before generating meta IDs.

_cluster_by_similarity() chooses the first remaining lesson as the centroid, so cluster membership depends on input order. Because discover_meta_rules() passes category groups through in caller order, the same lesson set can produce different clusters and different META-* IDs across runs, which breaks dedup/decay stability.

Suggested fix
 def _cluster_by_similarity(
     lessons: list[Lesson],
     threshold: float = 0.35,
 ) -> list[list[Lesson]]:
@@
-    unclustered = list(lessons)
+    unclustered = sorted(
+        lessons,
+        key=lambda l: (-l.confidence, l.category, l.description),
+    )
     clusters: list[list[Lesson]] = []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 291 - 314,
_cluser_by_similarity is non-deterministic because it picks the first remaining
lesson as centroid, causing different clusters (and thus different META-* IDs)
depending on input order; fix by making clustering order-stable:
deterministically sort the incoming lessons (or pick the initial centroid
deterministically) before clustering (e.g., sort by a stable unique field such
as lesson.id or lesson.title/description) so that _cluster_by_similarity (and
downstream discover_meta_rules) produces stable clusters across runs while
leaving the similarity logic and threshold unchanged.

Gradata added a commit that referenced this pull request Apr 21, 2026
…133)

* feat(sync): transform local rows to cloud schema + scrub JSONB payloads

Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json`
vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns`
-> `corrections`). Added `_transform_row` per-table mapper with deterministic
uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone
UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each
batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's
opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` /
`GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both
backend and SDK.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(pipeline): canonical graduation + persistent brain_prompt + two-provider synth

Phase 1 of the learning-pipeline revamp. Rule graduation now flows through
the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN,
>= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline.
Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED
header, regenerated only at session_close after the pipeline fires. LLM
synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with
claude CLI fallback (Max-plan OAuth) so users without an exportable key
still get synthesis. Meta-rule deterministic fallback now warns loudly
instead of silently discarding. Drops five env-flag gates in favour of
file-based signals.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(doctor): add cloud-health probing to gradata doctor

Adds --cloud / --no-cloud flags to the doctor CLI command and the
underlying diagnose() function. Flips the default cloud endpoint to
api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py
(all passing).

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(implicit_feedback): catch text-speak corrections (r/u/dont/cant)

Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").

Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
  "how come", "you missed/forgot/failed/didnt"

All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.

Co-Authored-By: Gradata <noreply@gradata.ai>

* test(implicit_feedback): cover text-speak and multi-signal inputs

14 new tests pinning the regex expansion from 5a6da45. Covers real
corrections observed this session ("Why r you not asking council",
"Why flag.. we don't skip we do work") plus shorthand cases
(dont / cant / again / you missed / how come). Dual-signal cases
assert both types detect. Full suite: 37 passed, 1 pre-existing skip.

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs: add pre-launch plan with numeric pivot/kill/scale triggers

Five post-launch metrics with precise definitions (activation, D7
retention, time-to-first-graduation, free->Pro conversion,
correction-rate decay). Numeric triggers: pivot <20% activation +
flat decay at D30; kill <100 installs at D60; scale >1K installs +
>=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8
of the pre-launch gap analysis.

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs(meta_rules): llm_synth now runs locally, not cloud-side

The source-provenance docstring referenced "cloud-side LLM synthesis"
which is stale since the graduation-cloud-gate was removed. Synthesis
runs on the user's machine via rule_synthesizer.py's two-provider path
(Anthropic SDK with user's key, or Claude Code Max CLI OAuth).

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs(marketing): correct stale cloud-graduation claims in Pro tier

Graduation and meta-rule LLM synthesis run entirely locally as of a
few sessions ago (rule_synthesizer.py uses user's own Anthropic key or
Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly
still claimed "cloud runs better graduation engine" and implied a
cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy
paragraph to match reality: free is functionally complete; Pro is
visualization, history, export, and the future community corpus.

NOTE: this file is listed in .gitignore per the earlier
"untrack private files" cleanup. Force-added at request.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(tests): assert brain_id not tenant_id in cloud push test

Test was checking the pre-transform local key name. _cloud_sync._transform_row
correctly emits brain_id (cloud schema) from tenant_id (local schema); the
assertion was stale.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(lesson_applications): close the compound-quality audit loop

Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.

Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
  members included), storing {category, description, task} in context so
  session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
    REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
    session shares the lesson's category (or description substring).
    CONFIRMED otherwise (rule survived the session).

Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.

Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs: truth-pass cloud-vs-SDK boundary across architecture + concepts

Sweeps the remaining docs that still claimed cloud gated any part of
the learning loop. Actual architecture (as of the graduation-local
pivot):

  Local SDK owns: correction capture, graduation, meta-rule clustering
  AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth),
  rule-to-hook promotion, manifest computation.

  Cloud owns: dashboard/visualization, cross-device sync, team brains,
  managed backups, future opt-in corpus donation.

Files touched:
- docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance.
- docs/architecture/cloud-monolith-v2.md — cloud-side workload framing.
- docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow.
- docs/concepts/meta-rules.md — synthesis is local, not cloud-gated.
- docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize.

README.md was already accurate; no changes there.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(ultrareview): address 4-agent review before public push

Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
  and escalate OperationalError to warning (missing-table surfaces).

Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
  no longer propagates and kills the whole push batch.

Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
  from the public repo; add both to .gitignore. These contain kill
  triggers, pricing, and PII that belong in the private brain vault only.

Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
  the detail field, restoring the documented status vocabulary
  ({ok, warn, fail, skip, missing, error}).

Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
  input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
  test_lesson_applications.py.

Verification: full suite 3802 pass, 22 skip, 2 xfailed.

* feat(meta_rules): port local-first discovery, unskip cloud-gated tests

Gradata is fully local-first now. Cloud-gate stubs and "requires cloud"
skip markers were legacy artifacts from an earlier architecture where
discovery/synthesis lived server-side. This commit finishes the port:

- meta_rules.discover_meta_rules + merge_into_meta run locally:
  category grouping + greedy semantic-similarity clustering, zombie
  filter on RULE-state lessons below 0.90, decay after 20 sessions,
  count/(count+3) confidence smoothing.
- Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer,
  test_meta_rule_generalization, test_multi_brain_simulation,
  test_pipeline_e2e. These tests now exercise the local impl directly.
- Retire the api_key-kwarg-on-merge_into_meta path (session-close
  rule_synthesizer drives LLM distillation now).
- Update fixtures to realistic prose so they survive the noise filter
  that rejects "cut:/added:" edit-distance summaries.
- Bump test_meta_rules confidence assertion to the smoothed formula.
- Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges
  (deprecated adapter shims, cloud docs, stale module docstrings).

Suite: 3809 passed, 14 skipped, 2 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>

* test(pipeline_e2e): remove stale 'not yet implemented' skips, bump fixtures

discover_meta_rules is implemented now (local-first). The
  if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.

Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).

Suite: 3813 passed, 10 skipped, 2 xfailed.

Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env

xfails (2) are tracked for v0.7 reconciliation in test docstring.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(graduation): correct MISFIRE_PENALTY sign in agent_graduation

Found while clearing remaining skipped/xfailed tests:

Bug: agent_graduation._update_lesson_confidence had
  confidence = max(0.0, confidence - MISFIRE_PENALTY)
but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added
confidence on rejection. Test test_rejection_decreases_confidence was
xfail'd with 'API drift, reconcile in v0.7' — it was a real bug.

Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY).

Other cleanups in the same pass:

- test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern
  was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson
  graduates straight to RULE (stronger than PATTERN). Accept either state.
- test_integration_workflow: delete stale module-level skipif guarding 5
  tests behind ANTHROPIC/OPENAI keys they never actually use. They only
  exercise local brain.correct/convergence/efficiency — no network.
- test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test
  already covered by the 20+ fake-client tests in the same file).
- test_meta_rules: delete test_with_real_data — dev-time exploration
  script with zero asserts, requiring GRADATA_LESSONS_PATH env var.

Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed.

Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl,
which does not exist on Windows. Complementary Windows paths skip on
Linux — running on each platform covers all 4. Cannot be eliminated.

From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>

* review: address 3 CRITICAL + 3 HIGH from PR #126 review

CRITICAL fixes:
- C1: rewrite meta_rules.py module docstring. It still said 'require
  Gradata Cloud' / 'no-ops in the open-source build' which directly
  contradicted the local-first implementation in the same file. Now
  describes the real algorithm. Closes LEGACY_CLEANUP item #3.
- C2: drop owner-name string from _NOISE_PATTERNS. The other entries
  are format-based (cut:/added:/content change) and filter just fine.
- C3: generalize the name-prefix strip regex in _build_principle from
  hardcoded 'Oliver:' to a generic 'Name:' pattern.

HIGH fixes:
- H1: update _update_lesson_confidence docstring to stop quoting the
  old -0.25 number and instead point at the canonical constants.
- H2: _apply_decay no longer mutates MetaRule in place — uses
  dataclasses.replace() so refresh_meta_rules' persisted inputs aren't
  silently modified.
- H3: add a comment explaining why the call-site threshold=0.20 is
  intentionally looser than _cluster_by_similarity's 0.35 default
  (category pre-filter handles most noise, recall matters more here).

Suite clean on touched areas.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat: context-pressure handoff watchdog + multimodal RAG embedder protocol

Closes #127: HandoffWatchdog fires a preemptive resume-doc at 0.65 pressure
(GRADATA_HANDOFF_THRESHOLD override), writes a compact Markdown handoff,
and emits a handoff.triggered event so auto-compaction isn't the first
signal the agent is out of budget.

Closes #128: MultimodalEmbedder Protocol + MultimodalInput validation +
TextOnlyEmbedder default + embed_any router. User supplies their own
multimodal provider (Gemini, Voyage, CLIP); Gradata never hosts the
endpoint. Falls back to text-only when no multimodal embedder is
configured.

Both are provider-agnostic, local-first, and covered by unit tests
(18 handoff + 20 embedder). Full suite: 3853 passed, 3 skipped.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix: address code review on PR #130

- HandoffWatchdog._fired now init=False/repr=False/compare=False so the
  guard cannot be bypassed via constructor and doesn't leak into equality.
- _hash_vector zero-norm branch now returns a zero vector instead of an
  unnormalised one, honouring the Protocol's normalisation contract.
- Add test covering the handoff.triggered event emission path so a
  _events.emit signature drift can't silently regress.

Co-Authored-By: Gradata <noreply@gradata.ai>

* chore(tests): remove private-hook test leaking into public SDK

test_capture_rule_failure.py reached out of Gradata/ via parents[4] to
load .claude/hooks/reflect/scripts/capture_learning.py — a private
Claude Code hook that is not part of the public SDK. The test would
skip on every machine except the author's worktree, adding a phantom
\"skipped\" count in CI for every downstream user.

If we want coverage for the matcher, rewrite it as a pure unit test
against a function exposed by the SDK, or keep it on the private side
next to the hook it exercises.

Suite after removal: 3854 passed, 2 skipped (the two legitimate POSIX
tests in test_file_lock.py that run on Linux CI).

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(hooks): SessionStart hook injects handoff into next agent

Wires the watchdog to the next agent's context: when HandoffWatchdog
fires and writes a handoff doc, the new SessionStart hook loads the
most recent unconsumed *.handoff.md from {brain_dir}/handoffs/, wraps
it in <handoff>...</handoff>, and returns it to Claude Code. The agent
sees the handoff before brain-rules (primacy) and picks up where the
prior agent left off.

After injection the file moves to handoffs/consumed/ so the next
session won't re-inject it. Oversized bodies are truncated
(GRADATA_HANDOFF_MAX_CHARS, default 4000). Embedded </handoff> literals
are escaped so a hostile body cannot close our wrapper early.

Helpers added to gradata.contrib.patterns.handoff:
  - default_handoff_dir(brain_dir) → Path  (canonical location)
  - pick_latest_unconsumed(dir) → Path | None
  - consume_handoff(path) → moves to consumed/ subdir

Tests: +16 hook tests + 9 helper tests = 41 total on handoff+hook.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(handoff): v2 rules-snapshot delta to save warm-resume tokens

Handoff now carries the timestamp of the rules the prior agent was
operating under. On next SessionStart, inject_handoff writes a
.handoff_active.json sentinel. inject_brain_rules reads it and, when
lessons.md has not changed since the snapshot, suppresses the ranked
<brain-rules> block — the handoff already carries that continuity.

Mandatory directives, disposition, meta-rules, and the brain_prompt
short-circuit still fire; only the ranked block is skipped. Gated by
GRADATA_HANDOFF_RULES_DELTA=1 (default on).

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(agent-precontext): dedup sub-agent rules against parent injection

Sub-agent spawns were re-injecting rules already present in the parent
session's context — measured ~500-2500 wasted tokens per multi-agent
workflow. agent_precontext now reads brain_dir/.last_injection.json
(written by inject_brain_rules on SessionStart) and skips any rule
whose full_id appears in the parent manifest.

Gated by GRADATA_SUBAGENT_DEDUP=1 (default on). Silent on missing
manifest — falls back to full injection. Matches the feature-flag
pattern used by the handoff-delta optimization.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(brain-prompt): cap brain_prompt.md at 4000 chars

brain_prompt.md had no size cap and grew unconstrained as the lesson
corpus matured, costing 500-3000 tokens per session on the primary
injection path. Add GRADATA_MAX_BRAIN_PROMPT_CHARS (default 4000)
with truncation marker, matching the inject_handoff pattern.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(context-inject): dedup FTS snippets against injected rules

context_inject fires on every UserPromptSubmit and returned FTS
snippets that frequently overlapped with rules already in the
<brain-rules> block — ~200-500 wasted tokens per prompt.

Drops any snippet with >70% Jaccard token overlap against an
injected rule description. Reads brain_dir/.last_injection.json
for the comparison corpus. Gated by GRADATA_CONTEXT_DEDUP=1 with
threshold override via GRADATA_CONTEXT_DEDUP_THRESHOLD.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(jit-inject): stop emitting events.jsonl on zero-match prompts

_emit_event ran unconditionally before the 'if not ranked: return'
guard, writing a JIT_INJECTION entry for every UserPromptSubmit even
when zero rules matched. Most prompts are zero-match, so this was
the dominant source of events.jsonl write amplification and hot-
path I/O overhead.

Moved the emit after the empty-guard so only successful injections
emit — matches the success-only pattern in inject_handoff.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(hooks): opt-out env kill switches for 6 Stop/PreToolUse/SessionStart hooks

Projects with a superset JS replacement (e.g. the Sprites overlay) can now
disable the Python SDK hook without patching SDK source. Default is on —
setting the env var to "0" skips the hook and returns None.

Vars added (default "1"):
  GRADATA_BRAIN_MAINTAIN     — Stop,           brain_maintain.py
  GRADATA_SESSION_PERSIST    — Stop,           session_persist.py
  GRADATA_SECRET_SCAN        — PreToolUse,     secret_scan.py
  GRADATA_CONFIG_PROTECTION  — PreToolUse,     config_protection.py
  GRADATA_DUPLICATE_GUARD    — PreToolUse,     duplicate_guard.py
  GRADATA_CONFIG_VALIDATE    — SessionStart,   config_validate.py

secret_scan additionally emits a stderr warning when disabled — it is the
sole line of defense against credential commits, so a silent opt-out on a
misconfigured project is too risky.

Hook-overlap audit 2026-04-21 (.tmp/hook-overlap-audit-2026-04-21.md):
items 10-14 + 17. Eliminates ~8-20s per Stop, ~200-400 tok per edit,
~1500 tok per session of duplicate work when a JS superset is active.

Tests: 3908 passed, 2 skipped (baseline 3828/2, +80 from unrelated).

Co-Authored-By: Gradata <noreply@gradata.ai>

---------

Co-authored-by: Gradata <noreply@gradata.ai>
@Gradata

Gradata commented Apr 21, 2026

Copy link
Copy Markdown
Owner Author

Superseded by main — meta_rules port, legacy-gate cleanup, and graduation bug fix have already landed on main via other merged PRs. Verified: after merging main into this branch with main winning conflicts, diff vs main is empty. The one orphan file (test_capture_rule_failure.py) tests a local .claude/ hook, not SDK source, and auto-skips when the hook isn't present. Closing.

@Gradata Gradata closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant