Skip to content

Audit: full codebase review — CodeRabbit vs Greptile#3

Closed
Gradata wants to merge 1 commit into
mainfrom
review/full-codebase-audit
Closed

Audit: full codebase review — CodeRabbit vs Greptile#3
Gradata wants to merge 1 commit into
mainfrom
review/full-codebase-audit

Conversation

@Gradata

@Gradata Gradata commented Apr 6, 2026

Copy link
Copy Markdown
Owner

Purpose

Compare CodeRabbit and Greptile review quality on the same PR.

Trigger file only — the real review is on the existing codebase context.

@coderabbitai summary

Codebase Context

  • SDK: src/gradata/ (33K LOC, 114 files)
  • Tests: 1339 passing
  • Nervous system: EventBus, embeddings, session history, rule ranker
  • Hooks: inject-brain-rules.js, tool-finding-capture.js, session-history-sync.js

After Review

Compare findings side by side. Close without merging. Delete trigger file.

Generated with Gradata

Greptile Summary

This PR adds a single-line markdown file (.coderabbit-review.md) as a trigger artifact to initiate a side-by-side comparison of CodeRabbit vs Greptile review quality on the existing gradata codebase. The file itself has no functional impact on the SDK, tests, or any runtime behaviour.

  • The only change is a new file containing # Full Codebase Audit Trigger — no code, config, or dependency modifications.
  • The PR description explicitly states it will be closed without merging and the trigger file deleted after the comparison is complete.
  • None of the project's code-review rules (logging, type annotations, path utilities, EventBus safety, from __future__ import annotations) apply to a static markdown file.
  • No issues found; this PR is safe to close as intended.

Confidence Score: 5/5

This PR is safe to close — the single added file is a no-op markdown document with zero runtime impact.

Score of 5 reflects that the only change is a trivial static markdown file; it introduces no code, no configuration, and no dependencies. All project code-review rules are scoped to Python files and do not apply here.

No files require special attention.

Important Files Changed

Filename Overview
.coderabbit-review.md Adds a single-line markdown trigger file; no functional impact, intended to be closed without merging.

Sequence Diagram

sequenceDiagram
    participant Author as PR Author
    participant GitHub as GitHub PR #3
    participant CR as CodeRabbit
    participant GP as Greptile
    Author->>GitHub: Push .coderabbit-review.md trigger file
    GitHub->>CR: Trigger full codebase review
    GitHub->>GP: Trigger full codebase review
    CR-->>GitHub: Post CodeRabbit review findings
    GP-->>GitHub: Post Greptile review findings
    Author->>GitHub: Compare reviews side-by-side
    Author->>GitHub: Close PR without merging
    Author->>GitHub: Delete trigger file
Loading

Reviews (1): Last reviewed commit: "chore: trigger full codebase review by C..." | Re-trigger Greptile

(4/5) You can add custom instructions or style guidelines for the agent here!

@coderabbitai

coderabbitai Bot commented Apr 6, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9bd4a443-5e21-4452-8cc7-14093bc47fb2

📥 Commits

Reviewing files that changed from the base of the PR and between 4f3f1a4 and 2a555cb.

📒 Files selected for processing (1)
  • .coderabbit-review.md
📜 Recent review details
🔇 Additional comments (1)
.coderabbit-review.md (1)

1-1: Change matches the PR intent and is safe.

This trigger line is clear and introduces no functional or operational risk.


📝 Walkthrough
  • Adds .coderabbit-review.md trigger file containing a single-line header
  • No code changes, configuration modifications, or functional alterations
  • No breaking changes, security fixes, or new public API methods
  • Designed to initiate a codebase audit comparison between CodeRabbit and Greptile review tools

Walkthrough

A new configuration file .coderabbit-review.md is added with a single directive to trigger a full codebase audit. No functional code or other files are modified.

Changes

Cohort / File(s) Summary
Configuration
.coderabbit-review.md
Added new configuration file containing a directive to trigger full codebase audit.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested labels

docs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Audit: full codebase review — CodeRabbit vs Greptile' accurately reflects the main purpose of the PR, which is to trigger a side-by-side comparison of two code review tools on the codebase.
Description check ✅ Passed The description clearly explains the PR's purpose, codebase context, and intended workflow, directly relating to the trigger file change and comparison objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch review/full-codebase-audit

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the docs label Apr 6, 2026
@Gradata Gradata closed this Apr 6, 2026
@Gradata Gradata deleted the review/full-codebase-audit branch April 6, 2026 06:35
@Gradata

Gradata commented Apr 6, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review src/gradata/brain.py

@coderabbitai

coderabbitai Bot commented Apr 6, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the brain.py file
cat src/gradata/brain.py

Length of output: 44645


@Gradata Here is a thorough review of src/gradata/brain.py:


Review: src/gradata/brain.py

🔴 Critical


1. spawn_queue() violates the class's own thread-safety contract

The module docstring explicitly states:

"Brain is NOT thread-safe … does not support concurrent writes from multiple threads."

Yet spawn_queue() runs workers concurrently via ThreadPoolExecutor, and those workers almost certainly call back into brain.* methods (corrections, events, etc.), which issue SQLite writes:

def spawn_queue(self, tasks, worker, *, max_concurrent=3, ...):
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_concurrent) as pool:
        futures = {pool.submit(_run, t): t for t in tasks}  # concurrent brain writes!

Either the method must document that worker must not call any Brain method, or it should use ProcessPoolExecutor (with the brain re-opened per process) or serialize calls through a single-threaded queue.


2. export() silently ignores its output_path parameter

The parameter is accepted but never passed to the underlying implementation:

def export(self, output_path: str | None = None, mode: str = "full") -> Path:
    from gradata._export_brain import export_brain
    return export_brain(
        include_prospects=(mode != "no-prospects"),
        domain_only=(mode == "domain == "domain-only"),
        ctx=self.ctx,
        # ← output_path is never forwarded
    )

Any caller relying on brain.export("./backups/snapshot.zip") will silently get the default path. output_path should be forwarded or removed from the signature.


3. _resolve_pending() — connection leak on exception

The connection is closed on the two early-return paths, but if write_lessons_safe or format_lessons raises an exception inside the with lessons_lock(...) block, the connection is never closed:

conn = get_connection(self.db_path)
# ... early returns call conn.close() ✓
with lessons_lock(lessons_path):
    ...
    write_lessons_safe(...)  # ← exception here leaks conn
# conn.close() only reached if everything succeeds

Wrap the entire body in try/finally:

conn = get_connection(self.db_path)
try:
    ...
finally:
    conn.close()

🟠 High


4. set_brain_dir() mutates global state — multi-instance unsafe

Every Brain.__init__ calls set_brain_dir(self.dir, working_dir), which presumably writes to a module-level global. If two Brain instances are created in the same process (e.g. in tests or a multi-tenant server), the second silently hijacks the global path used by all SDK modules. The BrainContext / DI path (self.ctx) exists precisely to avoid this, yet the global-mutation call remains:

# Build immutable context for this brain instance (DI path)
self.ctx = BrainContext.from_brain_dir(self.dir, working_dir)

# Point all SDK modules to this brain directory (backward compat path)
set_brain_dir(self.dir, working_dir)  # ← overwrites global for ALL instances

The "backward compat" comment suggests this is known tech-debt. Consider emitting a DeprecationWarning and tracking removal in a future milestone.


5. __init__ is far too large (~120 lines)

The constructor performs: path resolution, encryption setup, taxonomy reload, config reload, schema migrations, registry init, pipeline warm-start, rule-context bootstrap, event-bus wiring, and session-history subscription. This makes it impossible to unit-test any individual concern in isolation and slows down every Brain() call even when only a subset of features is needed.

Consider extracting a private _bootstrap() method or a builder/factory that composes only the needed subsystems.


🟡 Medium


6. apply_brain_rules() reads lessons.md from disk on every call

lessons_path = self._find_lessons_path()
if not lessons_path:
    return ""
lessons = parse_lessons(lessons_path.read_text(encoding="utf-8"))  # disk hit every time

In a tight agent loop this is called on every prompt. Cache the parsed lessons with an mtime-based invalidation (similar to how .embed-manifest.json tracks file hashes) so repeated calls within a session are served from memory.


7. _budget_op() opens a new SQLite connection on every call

def _budget_op(self, fn_name, *args):
    conn = _db.get_connection(self.db_path)
    _db.ensure_credit_budgets(conn)
    try:
        return getattr(_db, fn_name)(conn, *args)
    finally:
        conn.close()

check_budget and spend_budget can be called many times per session. Opening and closing a connection each time adds unnecessary overhead. Prefer a persistent connection held by the Brain instance, or at minimum reuse the connection across _budget_op calls within a request.


8. memory property lazy-init is not thread-safe

`@property`
def memory(self):
    if not hasattr(self, "_memory_manager"):
        from gradata.contrib.patterns.memory import MemoryManager
        self._memory_manager = MemoryManager()   # race condition
    return self._memory_manager

Two threads checking hasattr simultaneously could each construct a MemoryManager. Use __init__ initialization or a threading.Lock.


9. _find_lessons_path() — confusing create=True control flow

The break inside the for loop is non-obvious and may confuse future editors:

for p in (...):
    if p.is_file():
        if not create:
            return p
        break  # Found external but create=True → fall through to create local
if create:
    brain_local.parent.mkdir(...)
    brain_local.write_text(...)
    return brain_local

Extract a helper or add a named boolean (found_external = True) to make the intent clear.


10. lineage() bypasses the _db module

Every other method uses get_connection(self.db_path) from gradata._db, but lineage() uses a raw sqlite3.connect(). This means it misses any connection setup (PRAGMA settings, row factories, WAL mode, etc.) applied in get_connection.


🔵 Low / Style


11. import os as _os inside __init__

import os as _os
self._encryption_key = None
if encryption_key or _os.environ.get("GRADATA_ENCRYPTION_KEY"):

This is a one-off aliased import inside the constructor body. Move it to the module-level imports.


12. Semicolon-joined statements in _resolve_pending and review_pending

import sqlite3; conn.row_factory = sqlite3.Row

PEP 8 discourages multiple statements on one line. Split into two lines.


13. observe() instantiates MemoryExtractor on every call

def observe(self, messages, user_id="default"):
    extractor = MemoryExtractor()   # new object every call
    candidates = extractor.extract(messages)

If MemoryExtractor is stateless, initialize it once in __init__ and reuse it.


14. stats() rescans all markdown files on every call

The md_count line uses rglob("*.md") which traverses the entire directory tree each time stats() is called. Cache this value or compute it lazily.


Summary Table

# Severity Issue
1 🔴 Critical spawn_queue uses threads despite NOT thread-safe contract
2 🔴 Critical export() silently ignores output_path parameter
3 🔴 Critical _resolve_pending() connection leak on exception
4 🟠 High set_brain_dir() global state mutation breaks multi-instance use
5 🟠 High __init__ too large (~120 lines), not unit-testable
6 🟡 Medium apply_brain_rules() reads disk on every call — no caching
7 🟡 Medium _budget_op() opens a new DB connection on every call
8 🟡 Medium memory property has a thread-unsafe lazy-init race
9 🟡 Medium _find_lessons_path(create=True) control flow is confusing
10 🟡 Medium lineage() bypasses _db module, uses raw sqlite3.connect()
11 🔵 Low import os as _os inside constructor; move to module level
12 🔵 Low Semicolon-joined statements violate PEP 8
13 🔵 Low MemoryExtractor re-instantiated on every observe() call
14 🔵 Low stats() rescans all markdown files on every call

Gradata added a commit that referenced this pull request Apr 21, 2026
CRITICAL fixes:
- C1: rewrite meta_rules.py module docstring. It still said 'require
  Gradata Cloud' / 'no-ops in the open-source build' which directly
  contradicted the local-first implementation in the same file. Now
  describes the real algorithm. Closes LEGACY_CLEANUP item #3.
- C2: drop owner-name string from _NOISE_PATTERNS. The other entries
  are format-based (cut:/added:/content change) and filter just fine.
- C3: generalize the name-prefix strip regex in _build_principle from
  hardcoded 'Oliver:' to a generic 'Name:' pattern.

HIGH fixes:
- H1: update _update_lesson_confidence docstring to stop quoting the
  old -0.25 number and instead point at the canonical constants.
- H2: _apply_decay no longer mutates MetaRule in place — uses
  dataclasses.replace() so refresh_meta_rules' persisted inputs aren't
  silently modified.
- H3: add a comment explaining why the call-site threshold=0.20 is
  intentionally looser than _cluster_by_similarity's 0.35 default
  (category pre-filter handles most noise, recall matters more here).

Suite clean on touched areas.

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata added a commit that referenced this pull request Apr 21, 2026
…133)

* feat(sync): transform local rows to cloud schema + scrub JSONB payloads

Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json`
vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns`
-> `corrections`). Added `_transform_row` per-table mapper with deterministic
uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone
UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each
batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's
opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` /
`GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both
backend and SDK.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(pipeline): canonical graduation + persistent brain_prompt + two-provider synth

Phase 1 of the learning-pipeline revamp. Rule graduation now flows through
the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN,
>= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline.
Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED
header, regenerated only at session_close after the pipeline fires. LLM
synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with
claude CLI fallback (Max-plan OAuth) so users without an exportable key
still get synthesis. Meta-rule deterministic fallback now warns loudly
instead of silently discarding. Drops five env-flag gates in favour of
file-based signals.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(doctor): add cloud-health probing to gradata doctor

Adds --cloud / --no-cloud flags to the doctor CLI command and the
underlying diagnose() function. Flips the default cloud endpoint to
api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py
(all passing).

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(implicit_feedback): catch text-speak corrections (r/u/dont/cant)

Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").

Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
  "how come", "you missed/forgot/failed/didnt"

All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.

Co-Authored-By: Gradata <noreply@gradata.ai>

* test(implicit_feedback): cover text-speak and multi-signal inputs

14 new tests pinning the regex expansion from 5a6da45. Covers real
corrections observed this session ("Why r you not asking council",
"Why flag.. we don't skip we do work") plus shorthand cases
(dont / cant / again / you missed / how come). Dual-signal cases
assert both types detect. Full suite: 37 passed, 1 pre-existing skip.

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs: add pre-launch plan with numeric pivot/kill/scale triggers

Five post-launch metrics with precise definitions (activation, D7
retention, time-to-first-graduation, free->Pro conversion,
correction-rate decay). Numeric triggers: pivot <20% activation +
flat decay at D30; kill <100 installs at D60; scale >1K installs +
>=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8
of the pre-launch gap analysis.

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs(meta_rules): llm_synth now runs locally, not cloud-side

The source-provenance docstring referenced "cloud-side LLM synthesis"
which is stale since the graduation-cloud-gate was removed. Synthesis
runs on the user's machine via rule_synthesizer.py's two-provider path
(Anthropic SDK with user's key, or Claude Code Max CLI OAuth).

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs(marketing): correct stale cloud-graduation claims in Pro tier

Graduation and meta-rule LLM synthesis run entirely locally as of a
few sessions ago (rule_synthesizer.py uses user's own Anthropic key or
Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly
still claimed "cloud runs better graduation engine" and implied a
cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy
paragraph to match reality: free is functionally complete; Pro is
visualization, history, export, and the future community corpus.

NOTE: this file is listed in .gitignore per the earlier
"untrack private files" cleanup. Force-added at request.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(tests): assert brain_id not tenant_id in cloud push test

Test was checking the pre-transform local key name. _cloud_sync._transform_row
correctly emits brain_id (cloud schema) from tenant_id (local schema); the
assertion was stale.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(lesson_applications): close the compound-quality audit loop

Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.

Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
  members included), storing {category, description, task} in context so
  session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
    REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
    session shares the lesson's category (or description substring).
    CONFIRMED otherwise (rule survived the session).

Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.

Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs: truth-pass cloud-vs-SDK boundary across architecture + concepts

Sweeps the remaining docs that still claimed cloud gated any part of
the learning loop. Actual architecture (as of the graduation-local
pivot):

  Local SDK owns: correction capture, graduation, meta-rule clustering
  AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth),
  rule-to-hook promotion, manifest computation.

  Cloud owns: dashboard/visualization, cross-device sync, team brains,
  managed backups, future opt-in corpus donation.

Files touched:
- docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance.
- docs/architecture/cloud-monolith-v2.md — cloud-side workload framing.
- docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow.
- docs/concepts/meta-rules.md — synthesis is local, not cloud-gated.
- docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize.

README.md was already accurate; no changes there.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(ultrareview): address 4-agent review before public push

Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
  and escalate OperationalError to warning (missing-table surfaces).

Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
  no longer propagates and kills the whole push batch.

Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
  from the public repo; add both to .gitignore. These contain kill
  triggers, pricing, and PII that belong in the private brain vault only.

Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
  the detail field, restoring the documented status vocabulary
  ({ok, warn, fail, skip, missing, error}).

Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
  input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
  test_lesson_applications.py.

Verification: full suite 3802 pass, 22 skip, 2 xfailed.

* feat(meta_rules): port local-first discovery, unskip cloud-gated tests

Gradata is fully local-first now. Cloud-gate stubs and "requires cloud"
skip markers were legacy artifacts from an earlier architecture where
discovery/synthesis lived server-side. This commit finishes the port:

- meta_rules.discover_meta_rules + merge_into_meta run locally:
  category grouping + greedy semantic-similarity clustering, zombie
  filter on RULE-state lessons below 0.90, decay after 20 sessions,
  count/(count+3) confidence smoothing.
- Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer,
  test_meta_rule_generalization, test_multi_brain_simulation,
  test_pipeline_e2e. These tests now exercise the local impl directly.
- Retire the api_key-kwarg-on-merge_into_meta path (session-close
  rule_synthesizer drives LLM distillation now).
- Update fixtures to realistic prose so they survive the noise filter
  that rejects "cut:/added:" edit-distance summaries.
- Bump test_meta_rules confidence assertion to the smoothed formula.
- Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges
  (deprecated adapter shims, cloud docs, stale module docstrings).

Suite: 3809 passed, 14 skipped, 2 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>

* test(pipeline_e2e): remove stale 'not yet implemented' skips, bump fixtures

discover_meta_rules is implemented now (local-first). The
  if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.

Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).

Suite: 3813 passed, 10 skipped, 2 xfailed.

Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env

xfails (2) are tracked for v0.7 reconciliation in test docstring.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(graduation): correct MISFIRE_PENALTY sign in agent_graduation

Found while clearing remaining skipped/xfailed tests:

Bug: agent_graduation._update_lesson_confidence had
  confidence = max(0.0, confidence - MISFIRE_PENALTY)
but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added
confidence on rejection. Test test_rejection_decreases_confidence was
xfail'd with 'API drift, reconcile in v0.7' — it was a real bug.

Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY).

Other cleanups in the same pass:

- test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern
  was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson
  graduates straight to RULE (stronger than PATTERN). Accept either state.
- test_integration_workflow: delete stale module-level skipif guarding 5
  tests behind ANTHROPIC/OPENAI keys they never actually use. They only
  exercise local brain.correct/convergence/efficiency — no network.
- test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test
  already covered by the 20+ fake-client tests in the same file).
- test_meta_rules: delete test_with_real_data — dev-time exploration
  script with zero asserts, requiring GRADATA_LESSONS_PATH env var.

Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed.

Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl,
which does not exist on Windows. Complementary Windows paths skip on
Linux — running on each platform covers all 4. Cannot be eliminated.

From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>

* review: address 3 CRITICAL + 3 HIGH from PR #126 review

CRITICAL fixes:
- C1: rewrite meta_rules.py module docstring. It still said 'require
  Gradata Cloud' / 'no-ops in the open-source build' which directly
  contradicted the local-first implementation in the same file. Now
  describes the real algorithm. Closes LEGACY_CLEANUP item #3.
- C2: drop owner-name string from _NOISE_PATTERNS. The other entries
  are format-based (cut:/added:/content change) and filter just fine.
- C3: generalize the name-prefix strip regex in _build_principle from
  hardcoded 'Oliver:' to a generic 'Name:' pattern.

HIGH fixes:
- H1: update _update_lesson_confidence docstring to stop quoting the
  old -0.25 number and instead point at the canonical constants.
- H2: _apply_decay no longer mutates MetaRule in place — uses
  dataclasses.replace() so refresh_meta_rules' persisted inputs aren't
  silently modified.
- H3: add a comment explaining why the call-site threshold=0.20 is
  intentionally looser than _cluster_by_similarity's 0.35 default
  (category pre-filter handles most noise, recall matters more here).

Suite clean on touched areas.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat: context-pressure handoff watchdog + multimodal RAG embedder protocol

Closes #127: HandoffWatchdog fires a preemptive resume-doc at 0.65 pressure
(GRADATA_HANDOFF_THRESHOLD override), writes a compact Markdown handoff,
and emits a handoff.triggered event so auto-compaction isn't the first
signal the agent is out of budget.

Closes #128: MultimodalEmbedder Protocol + MultimodalInput validation +
TextOnlyEmbedder default + embed_any router. User supplies their own
multimodal provider (Gemini, Voyage, CLIP); Gradata never hosts the
endpoint. Falls back to text-only when no multimodal embedder is
configured.

Both are provider-agnostic, local-first, and covered by unit tests
(18 handoff + 20 embedder). Full suite: 3853 passed, 3 skipped.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix: address code review on PR #130

- HandoffWatchdog._fired now init=False/repr=False/compare=False so the
  guard cannot be bypassed via constructor and doesn't leak into equality.
- _hash_vector zero-norm branch now returns a zero vector instead of an
  unnormalised one, honouring the Protocol's normalisation contract.
- Add test covering the handoff.triggered event emission path so a
  _events.emit signature drift can't silently regress.

Co-Authored-By: Gradata <noreply@gradata.ai>

* chore(tests): remove private-hook test leaking into public SDK

test_capture_rule_failure.py reached out of Gradata/ via parents[4] to
load .claude/hooks/reflect/scripts/capture_learning.py — a private
Claude Code hook that is not part of the public SDK. The test would
skip on every machine except the author's worktree, adding a phantom
\"skipped\" count in CI for every downstream user.

If we want coverage for the matcher, rewrite it as a pure unit test
against a function exposed by the SDK, or keep it on the private side
next to the hook it exercises.

Suite after removal: 3854 passed, 2 skipped (the two legitimate POSIX
tests in test_file_lock.py that run on Linux CI).

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(hooks): SessionStart hook injects handoff into next agent

Wires the watchdog to the next agent's context: when HandoffWatchdog
fires and writes a handoff doc, the new SessionStart hook loads the
most recent unconsumed *.handoff.md from {brain_dir}/handoffs/, wraps
it in <handoff>...</handoff>, and returns it to Claude Code. The agent
sees the handoff before brain-rules (primacy) and picks up where the
prior agent left off.

After injection the file moves to handoffs/consumed/ so the next
session won't re-inject it. Oversized bodies are truncated
(GRADATA_HANDOFF_MAX_CHARS, default 4000). Embedded </handoff> literals
are escaped so a hostile body cannot close our wrapper early.

Helpers added to gradata.contrib.patterns.handoff:
  - default_handoff_dir(brain_dir) → Path  (canonical location)
  - pick_latest_unconsumed(dir) → Path | None
  - consume_handoff(path) → moves to consumed/ subdir

Tests: +16 hook tests + 9 helper tests = 41 total on handoff+hook.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(handoff): v2 rules-snapshot delta to save warm-resume tokens

Handoff now carries the timestamp of the rules the prior agent was
operating under. On next SessionStart, inject_handoff writes a
.handoff_active.json sentinel. inject_brain_rules reads it and, when
lessons.md has not changed since the snapshot, suppresses the ranked
<brain-rules> block — the handoff already carries that continuity.

Mandatory directives, disposition, meta-rules, and the brain_prompt
short-circuit still fire; only the ranked block is skipped. Gated by
GRADATA_HANDOFF_RULES_DELTA=1 (default on).

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(agent-precontext): dedup sub-agent rules against parent injection

Sub-agent spawns were re-injecting rules already present in the parent
session's context — measured ~500-2500 wasted tokens per multi-agent
workflow. agent_precontext now reads brain_dir/.last_injection.json
(written by inject_brain_rules on SessionStart) and skips any rule
whose full_id appears in the parent manifest.

Gated by GRADATA_SUBAGENT_DEDUP=1 (default on). Silent on missing
manifest — falls back to full injection. Matches the feature-flag
pattern used by the handoff-delta optimization.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(brain-prompt): cap brain_prompt.md at 4000 chars

brain_prompt.md had no size cap and grew unconstrained as the lesson
corpus matured, costing 500-3000 tokens per session on the primary
injection path. Add GRADATA_MAX_BRAIN_PROMPT_CHARS (default 4000)
with truncation marker, matching the inject_handoff pattern.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(context-inject): dedup FTS snippets against injected rules

context_inject fires on every UserPromptSubmit and returned FTS
snippets that frequently overlapped with rules already in the
<brain-rules> block — ~200-500 wasted tokens per prompt.

Drops any snippet with >70% Jaccard token overlap against an
injected rule description. Reads brain_dir/.last_injection.json
for the comparison corpus. Gated by GRADATA_CONTEXT_DEDUP=1 with
threshold override via GRADATA_CONTEXT_DEDUP_THRESHOLD.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(jit-inject): stop emitting events.jsonl on zero-match prompts

_emit_event ran unconditionally before the 'if not ranked: return'
guard, writing a JIT_INJECTION entry for every UserPromptSubmit even
when zero rules matched. Most prompts are zero-match, so this was
the dominant source of events.jsonl write amplification and hot-
path I/O overhead.

Moved the emit after the empty-guard so only successful injections
emit — matches the success-only pattern in inject_handoff.

Co-Authored-By: Gradata <noreply@gradata.ai>

* feat(hooks): opt-out env kill switches for 6 Stop/PreToolUse/SessionStart hooks

Projects with a superset JS replacement (e.g. the Sprites overlay) can now
disable the Python SDK hook without patching SDK source. Default is on —
setting the env var to "0" skips the hook and returns None.

Vars added (default "1"):
  GRADATA_BRAIN_MAINTAIN     — Stop,           brain_maintain.py
  GRADATA_SESSION_PERSIST    — Stop,           session_persist.py
  GRADATA_SECRET_SCAN        — PreToolUse,     secret_scan.py
  GRADATA_CONFIG_PROTECTION  — PreToolUse,     config_protection.py
  GRADATA_DUPLICATE_GUARD    — PreToolUse,     duplicate_guard.py
  GRADATA_CONFIG_VALIDATE    — SessionStart,   config_validate.py

secret_scan additionally emits a stderr warning when disabled — it is the
sole line of defense against credential commits, so a silent opt-out on a
misconfigured project is too risky.

Hook-overlap audit 2026-04-21 (.tmp/hook-overlap-audit-2026-04-21.md):
items 10-14 + 17. Eliminates ~8-20s per Stop, ~200-400 tok per edit,
~1500 tok per session of duplicate work when a JS superset is active.

Tests: 3908 passed, 2 skipped (baseline 3828/2, +80 from unrelated).

Co-Authored-By: Gradata <noreply@gradata.ai>

---------

Co-authored-by: Gradata <noreply@gradata.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant