chore: cleanup PRs B+C+D from CLEANUP_ROADMAP by Gradata · Pull Request #182 · Gradata/gradata

Gradata · 2026-05-06T22:47:11Z

Consolidates the remaining cleanup work from CLEANUP_ROADMAP.md. Follows up on PR #179 (Cleanup PR A).

PR B (Section 1+2):

Delete graduation/scoring.py + tests (alternate scoring spine, no production callers)
Move judgment_decay and rules_distillation to enhancements/experimental/ with back-compat shims
Factor shared correction-rate helper at _correction_metrics.py; both _manifest_metrics and scoring/correction_tracking call it

PR C (Section 3):

Delete rule_synthesizer.py + tests (no production callers)
Update meta_rules docstring to point at llm_synthesizer (the actual code path)
Add golden prompt fixture for llm_synthesizer._build_prompt

PR D (Section 6):

Rename integrations/agent_lightning to tuning/agent_lightning
Back-compat shim at old path with DeprecationWarning
Update cli.py, examples/tune_one_prompt.py, and tests to import from tuning/

Tests:

4181 passed, 11 skipped (codex sandbox blocked daemon socket bind)
Full suite will run on CI
Layering check: no Layer 0 to 2 imports introduced

Risk: moderate. All renames have shims, deletes verified by rg showing zero production callers.

Generated by codex/gpt-5.5 worker (proc_77cd9c62831a). Recovery patch landed via parent agent.

PR B (Section 1 + Section 2): - DELETE graduation/scoring.py + tests/test_graduation_scoring.py (alternate scoring spine, no production callers) - MOVE judgment_decay → enhancements/experimental/judgment_decay (with shim) - MOVE rules_distillation → enhancements/experimental/rules_distillation (with shim) - ADD shared correction-rate helper at src/gradata/_correction_metrics.py; _manifest_metrics + scoring/correction_tracking now call it (no formula duplication) PR C (Section 3): - DELETE rule_synthesizer.py + test_rule_synthesizer.py (no production callers — meta_rules docstring referenced it but actual code path is llm_synthesizer) - ADD golden prompt fixture for llm_synthesizer._build_prompt - UPDATE meta_rules docstring to point at llm_synthesizer (the live path) PR D (Section 6): - RENAME integrations/agent_lightning → tuning/agent_lightning - ADD back-compat shim at integrations/agent_lightning with DeprecationWarning - UPDATE cli.py + examples/tune_one_prompt.py + tests to import from tuning/ - LEAVE integrations/session_history.py shim alone (already handled in PR #179) Validation: - pytest -x -k 'not daemon_extended and not plugin_integration': 4181 passed, 11 skipped, 13 deselected - Full pytest -x blocked by sandbox socket bind in test_daemon_extended; will run on CI Layering check: no Layer 0 → 2 imports introduced. All renames have shims. Generated by codex/gpt-5.5 worker (proc_77cd9c62831a). Author: Oliver Le.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-06T22:47:24Z

📝 Walkthrough

Walkthrough

This PR restructures Agent-Lightning integration from gradata.integrations to a new gradata.tuning namespace, extracts shared correction-metric utilities, moves judgment-decay and rules-distillation implementations to an experimental module with deprecated shims in graduation, adds a new graduation scoring engine, and refactors LLM/session-close logic.

Changes

Core Refactor: Namespace Reorganization and Utility Extraction

Layer / File(s)	Summary
Shared Utilities `src/gradata/_correction_metrics.py`	New module exports `correction_rate()` helper for centralized correction-rate calculations with optional rounding.
Tuning Module Scaffold `src/gradata/tuning/__init__.py`, `src/gradata/tuning/agent_lightning/__init__.py`	New `gradata.tuning` package with lazy-loading `__getattr__` entry point for Agent-Lightning bridge.
Agent-Lightning Bridge (New) `src/gradata/tuning/agent_lightning/litagent.py`, `reward.py`, `runner.py`	Full implementations of `GradataLitAgent` wrapper, `gradata_reward` scoring, and `run_apo_tune` CLI runner relocated from integrations module.
Experimental Enhancements `src/gradata/enhancements/experimental/__init__.py`, `judgment_decay.py`, `rules_distillation.py`	New experimental module containing pure-computation implementations of judgment-decay (with session-type awareness, decay logic) and rules-distillation (grouping, coverage checking, proposal formatting).
Deprecated Shims `src/gradata/integrations/agent_lightning/`, `src/gradata/enhancements/graduation/judgment_decay.py`, `rules_distillation.py`	Old integration and graduation modules converted to deprecation-warning shims that re-export from new locations (tuning and experimental respectively).
Graduation Scoring Engine `src/gradata/enhancements/graduation/scoring.py`	New opt-in scoring module with blended component calculation (confidence, fire saturation, reliability veto, recency decay, maturity), cut-point-based state transitions, and `should_graduate_lesson` wrapper.
LLM & Session Utilities `src/gradata/enhancements/llm_synthesizer.py`	Extracted `_build_prompt()` helper to centralize LLM prompt template construction.
Session Hook Simplification `src/gradata/hooks/session_close.py`	Removed `_refresh_brain_prompt` functionality; graduation/pipeline waterfall now gated only by `_has_new_triggers`.
Correction Tracking Integration `src/gradata/enhancements/scoring/correction_tracking.py`, `src/gradata/_manifest_metrics.py`	Updated to use shared `correction_rate()` helper; no behavioral changes to correction-profile or manifest-metric computation.
Import Updates & CLI `src/gradata/cli.py`, `examples/tune_one_prompt.py`, `src/gradata/integrations/__init__.py`, `src/gradata/enhancements/meta_rules.py`, `llm_provider.py`	Updated `run_apo_tune` imports to use `gradata.tuning.agent_lightning`; added/clarified deprecation documentation.
Tests & Fixtures `tests/test_agent_lightning_bridge.py`, `test_graduation_scoring.py`, `test_llm_synthesizer.py`, `test_rule_synthesizer.py`, `fixtures/synthesize_prompt.golden.txt`	New comprehensive test coverage for graduation scoring logic, LLM prompt golden fixture, rule synthesizer fail-safe contracts; updated Agent-Lightning bridge tests to import from new module location.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Gradata/gradata#106: Modifies session_close hook with trigger-gating and .last_close_ts stamping logic.
Gradata/gradata#172: Refactors Agent-Lightning bridge and tuning CLI integration.
Gradata/gradata#93: Earlier merge that introduced tuning.agent_lightning, judgment_decay, rules_distillation, and related enhancements.

Suggested labels

feature, refactor

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'chore: cleanup PRs B+C+D from CLEANUP_ROADMAP' clearly and concisely summarizes the main change: consolidating cleanup work from a roadmap across multiple sections.
Description check	✅ Passed	The description is directly related to the changeset, providing detailed context about the cleanup work being consolidated, including specific sections (B, C, D), affected files, and validation results.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/cleanup-bcd

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.20.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.27][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/enhancements/experimental/judgment_decay.py`:
- Around line 101-104: The membership check on session_type can silently skip
decay for typos/unexpected values; before calling session_type.lower() and
looking up CATEGORY_SESSION_TYPES, validate and normalize session_type (ensure
it's a str, strip/normalize casing) and explicitly guard unknown values: if the
normalized session_type is not present in CATEGORY_SESSION_TYPES keys and not in
ALL_SESSION_TYPES, return False (or log a warning) so unknown/typo values don't
make categories unintentionally non-testable; reference the session_type
variable, CATEGORY_SESSION_TYPES and ALL_SESSION_TYPES when applying this guard.

In `@Gradata/src/gradata/enhancements/experimental/rules_distillation.py`:
- Around line 88-99: The code picks a representative description using
cat_entries[-1] and builds sources via an unordered set, which is
non-deterministic; update the logic to deterministically choose the most recent
entry (e.g., select the entry with the max timestamp/created_at field using
max(cat_entries, key=lambda e: e.timestamp or e.created_at)) and set
representative = that_entry.description, and replace sources = list({e.source
for e in cat_entries}) with a sorted deterministic list (e.g., sources =
sorted({e.source for e in cat_entries}) or sorted({e.source for e in
cat_entries}, key=some_stable_key)) so both representative selection and
evidence_sources are stable across runs.

In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py`:
- Around line 109-111: The current except block in litagent.py that catches
formatting failures for PromptTemplate (around the PromptTemplate.format call
which returns str(prompt_template.template)) only logs at debug level and hides
tracebacks; change the handler in the except Exception as exc within the
function/method that uses prompt_template.template to call logger.warning with a
descriptive message and include exc_info=True (to emit the traceback), keeping
the fallback return of str(prompt_template.template); do not suppress the
exception or remove the existing typed Exception catch.
- Around line 100-103: The loop in litagent.py currently treats any string from
resources.values() as a prompt template (checking isinstance(value, str)) which
can pick unrelated metadata; change the condition to only accept real
prompt-template objects (e.g., objects exposing a template attribute or
instances of the actual PromptTemplate class your project uses) and remove the
isinstance(value, str) branch so only values with hasattr(value, "template") or
isinstance(value, PromptTemplate) are returned; update imports to reference
PromptTemplate (or the concrete template class used) and leave fallback to
self.prompt_template unchanged.

In `@Gradata/src/gradata/tuning/agent_lightning/reward.py`:
- Around line 57-72: The current except blocks for brain.search and
brain.query_events suppress errors with logger.debug and lose traceback; update
the except handlers in reward lookup (around brain.search and brain.query_events
in reward.py) to log at warning level and include the exception traceback (pass
exc_info=True or use logger.warning(..., exc_info=exc)) while preserving
existing fallback behavior (setting results = [] and history = []), so failures
are visible in production for functions involving brain.search,
brain.query_events, and subsequent processing like _event_from_search_result.

In `@Gradata/src/gradata/tuning/agent_lightning/runner.py`:
- Around line 206-208: The except block that currently does "except Exception as
exc" then logs at debug and returns fallback should instead log a warning
including the traceback so failures aren't silently hidden; update the handler
around the APO best-prompt retrieval to call logger.warning with a clear message
and exc_info=True (keeping the "except Exception as exc" and the subsequent
"return fallback") so the exception and stacktrace for the best-prompt fallback
are recorded (refer to symbols APO, fallback, and logger in runner.py).
- Around line 148-152: The code currently only looks at event.get("data") for
draft/final extraction and skips events that store payload in data_json; update
the extraction to also check event.get("data_json") as a fallback: if
event.get("data") is not a dict, attempt to use event.get("data_json") (if it's
a dict use it directly; if it's a JSON string parse it with json.loads) before
calling _first_text for draft_text/draft/task_input/input and
final_text/final/expected/correction. Ensure you still skip when neither source
yields a draft or final and keep using the existing _first_text calls and
variable names (data, draft, final).

In `@Gradata/tests/test_llm_synthesizer.py`:
- Around line 129-131: The test currently builds the fixture path with
Path("tests/...") which breaks when CWD isn't the repo root; update the code
that sets expected (in test_llm_synthesizer.py) to resolve the fixture relative
to the test file by using Path(__file__).parent (or appropriate ancestor) joined
with "fixtures/synthesize_prompt.golden.txt" before calling read_text so the
fixture lookup is independent of the current working directory.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef933b46-7a6e-49c0-b452-68cc6820f840

📥 Commits

Reviewing files that changed from the base of the PR and between 86c972f and f5cf6fc.

📒 Files selected for processing (31)

Gradata/examples/tune_one_prompt.py
Gradata/src/gradata/_correction_metrics.py
Gradata/src/gradata/_manifest_metrics.py
Gradata/src/gradata/cli.py
Gradata/src/gradata/enhancements/experimental/__init__.py
Gradata/src/gradata/enhancements/experimental/judgment_decay.py
Gradata/src/gradata/enhancements/experimental/rules_distillation.py
Gradata/src/gradata/enhancements/graduation/judgment_decay.py
Gradata/src/gradata/enhancements/graduation/rules_distillation.py
Gradata/src/gradata/enhancements/graduation/scoring.py
Gradata/src/gradata/enhancements/llm_provider.py
Gradata/src/gradata/enhancements/llm_synthesizer.py
Gradata/src/gradata/enhancements/meta_rules.py
Gradata/src/gradata/enhancements/rule_synthesizer.py
Gradata/src/gradata/enhancements/scoring/correction_tracking.py
Gradata/src/gradata/hooks/session_close.py
Gradata/src/gradata/integrations/__init__.py
Gradata/src/gradata/integrations/agent_lightning/__init__.py
Gradata/src/gradata/integrations/agent_lightning/litagent.py
Gradata/src/gradata/integrations/agent_lightning/reward.py
Gradata/src/gradata/integrations/agent_lightning/runner.py
Gradata/src/gradata/tuning/__init__.py
Gradata/src/gradata/tuning/agent_lightning/__init__.py
Gradata/src/gradata/tuning/agent_lightning/litagent.py
Gradata/src/gradata/tuning/agent_lightning/reward.py
Gradata/src/gradata/tuning/agent_lightning/runner.py
Gradata/tests/fixtures/synthesize_prompt.golden.txt
Gradata/tests/test_agent_lightning_bridge.py
Gradata/tests/test_graduation_scoring.py
Gradata/tests/test_llm_synthesizer.py
Gradata/tests/test_rule_synthesizer.py

💤 Files with no reviewable changes (5)

Gradata/src/gradata/hooks/session_close.py
Gradata/tests/test_rule_synthesizer.py
Gradata/src/gradata/enhancements/rule_synthesizer.py
Gradata/tests/test_graduation_scoring.py
Gradata/src/gradata/enhancements/graduation/scoring.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: pytest macos-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.12
GitHub Check: pytest windows-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.11
GitHub Check: pytest macos-latest / py3.12
GitHub Check: pytest windows-latest / py3.12

🧰 Additional context used

📓 Path-based instructions (2)

Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

Gradata/src/gradata/enhancements/experimental/__init__.py
Gradata/src/gradata/tuning/__init__.py
Gradata/src/gradata/cli.py
Gradata/src/gradata/_correction_metrics.py
Gradata/src/gradata/integrations/__init__.py
Gradata/src/gradata/enhancements/scoring/correction_tracking.py
Gradata/src/gradata/_manifest_metrics.py
Gradata/src/gradata/integrations/agent_lightning/runner.py
Gradata/src/gradata/enhancements/llm_synthesizer.py
Gradata/src/gradata/enhancements/meta_rules.py
Gradata/src/gradata/enhancements/llm_provider.py
Gradata/src/gradata/tuning/agent_lightning/__init__.py
Gradata/src/gradata/integrations/agent_lightning/reward.py
Gradata/src/gradata/enhancements/graduation/judgment_decay.py
Gradata/src/gradata/enhancements/graduation/rules_distillation.py
Gradata/src/gradata/integrations/agent_lightning/__init__.py
Gradata/src/gradata/integrations/agent_lightning/litagent.py
Gradata/src/gradata/enhancements/experimental/rules_distillation.py
Gradata/src/gradata/tuning/agent_lightning/reward.py
Gradata/src/gradata/tuning/agent_lightning/litagent.py
Gradata/src/gradata/tuning/agent_lightning/runner.py
Gradata/src/gradata/enhancements/experimental/judgment_decay.py

Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

Gradata/tests/test_agent_lightning_bridge.py
Gradata/tests/test_llm_synthesizer.py

🔇 Additional comments (23)

Gradata/src/gradata/enhancements/llm_provider.py (1)

239-239: Docstring clarification is accurate and low-risk.

The updated wording keeps behavior description clear without changing execution semantics.

Gradata/src/gradata/enhancements/meta_rules.py (2)

25-27: Reference updates now match the actual synthesis path.

Good alignment to llm_synthesizer and current provider-based flow.

Also applies to: 64-65

410-413: Deterministic-vs-LLM responsibility boundary is clearly documented.

This makes the merge_into_meta contract easier to reason about for callers and tests.

Gradata/tests/fixtures/synthesize_prompt.golden.txt (1)

1-9: Golden prompt fixture is crisp and contract-focused.

This is a strong deterministic baseline for template-regression tests.

Gradata/tests/test_llm_synthesizer.py (1)

122-128: Great addition of a deterministic prompt-contract test.

Locking _build_prompt output to a golden fixture materially reduces accidental prompt drift.

Also applies to: 132-133, 136-136

Gradata/src/gradata/enhancements/llm_synthesizer.py (2)

96-96: Prompt-builder extraction at the call site is a clean refactor.

This keeps synthesis flow unchanged while making prompt behavior testable in isolation.

155-165: _build_prompt centralizes template logic effectively.

Nice cohesion improvement, and it pairs well with the new golden-fixture test.

Gradata/src/gradata/enhancements/experimental/__init__.py (1)

1-1: Docstring is clear and scoped correctly.
Good lightweight module marker for experimental-only wiring.

Gradata/src/gradata/tuning/__init__.py (1)

1-1: Namespace description matches the new package role.
Looks good.

Gradata/src/gradata/integrations/__init__.py (1)

14-16: Deprecation forwarding note is accurate and explicit.
The updated target paths/readability are solid.

Gradata/src/gradata/_correction_metrics.py (1)

10-24: Shared ratio helper is clean and predictable.
Centralizing this logic reduces drift across metric call sites.

Gradata/src/gradata/enhancements/graduation/judgment_decay.py (1)

7-15: Shim behavior is correct for staged migration.
Deprecation warning + forwarding import aligns with the compatibility plan.

Gradata/examples/tune_one_prompt.py (1)

7-7: Example import was updated to the new tuning namespace correctly.
Looks consistent with the deprecation path.

Gradata/src/gradata/cli.py (1)

510-510: CLI tuning import migration is correct.
This points cmd_tune at the canonical gradata.tuning path.

Gradata/src/gradata/enhancements/scoring/correction_tracking.py (1)

19-20: Shared correction-rate helper adoption looks good.

This keeps correction-rate behavior centralized and consistent with the metrics layer.

Also applies to: 373-373

Gradata/tests/test_agent_lightning_bridge.py (1)

20-20: Import path migration in tests is correct.

The test now validates the new gradata.tuning.agent_lightning surface directly.

Gradata/src/gradata/_manifest_metrics.py (1)

31-31: Correction-rate normalization is consistently applied.

Using the shared helper in both trend and quality paths is a solid cleanup and reduces drift risk.

Also applies to: 105-105, 355-360

Gradata/src/gradata/integrations/agent_lightning/__init__.py (1)

1-1: Deprecation shim is implemented cleanly.

Lazy forwarding plus preserved __all__ keeps backward compatibility while steering callers to the new module.

Also applies to: 8-13, 16-16, 27-37

Gradata/src/gradata/integrations/agent_lightning/runner.py (1)

1-1: Runner shim migration path looks good.

Deprecation warning + re-export preserves compatibility with minimal risk.

Also applies to: 7-12, 14-14

Gradata/src/gradata/tuning/agent_lightning/__init__.py (1)

15-24: New tuning package export surface is solid.

The lazy __getattr__ pattern and __all__ declaration provide a clean, stable API boundary.

Also applies to: 27-40

Gradata/src/gradata/enhancements/graduation/rules_distillation.py (1)

7-12: Rules-distillation compatibility shim is well-structured.

Deprecation notice and explicit re-exported API make this transition straightforward for downstream imports.

Also applies to: 14-20, 22-28

Gradata/src/gradata/integrations/agent_lightning/reward.py (1)

7-12: Reward shim migration is correctly implemented.

The deprecated path still works and cleanly forwards to the new tuning namespace.

Also applies to: 14-14

Gradata/src/gradata/integrations/agent_lightning/litagent.py (1)

7-14: Deprecation shim looks correct and consistent.

The warning + re-export behavior is clear and keeps backward compatibility intact.

coderabbitai · 2026-05-06T22:52:08Z

+    if session_type is None:
+        return True  # backward compat: no filtering
+    testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES)
+    return session_type.lower() in testable_types


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard unknown session_type values before membership check.

At Line 104, an unrecognized session_type (typo/new value) makes categories effectively non-testable, so decay gets skipped silently across the batch. Add an explicit guard/fallback before the lookup check.

Proposed fix

def is_category_testable(category: str, session_type: str | None) -> bool: @@ if session_type is None: return True # backward compat: no filtering - testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES) - return session_type.lower() in testable_types + normalized = session_type.lower() + if normalized not in ALL_SESSION_TYPES: + return True # backward compat for unknown/new session types + testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES) + return normalized in testable_types

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/enhancements/experimental/judgment_decay.py` around lines 101 - 104, The membership check on session_type can silently skip decay for typos/unexpected values; before calling session_type.lower() and looking up CATEGORY_SESSION_TYPES, validate and normalize session_type (ensure it's a str, strip/normalize casing) and explicitly guard unknown values: if the normalized session_type is not present in CATEGORY_SESSION_TYPES keys and not in ALL_SESSION_TYPES, return False (or log a warning) so unknown/typo values don't make categories unintentionally non-testable; reference the session_type variable, CATEGORY_SESSION_TYPES and ALL_SESSION_TYPES when applying this guard.

coderabbitai · 2026-05-06T22:52:08Z

+        # Representative description: most recent entry
+        representative = cat_entries[-1].description
+
+        # Check if already covered by existing rules
+        covered_by = _check_coverage(
+            " ".join(e.description for e in cat_entries),
+            existing_rules,
+        )
+
+        sources = list({e.source for e in cat_entries})
+        statuses = Counter(e.status for e in cat_entries)
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Representative/source selection is order-unstable.

Line 89 assumes input order equals recency, and Line 97 builds evidence_sources from a set (non-deterministic order). This can produce inconsistent proposals/output.

Suggested fix

- # Representative description: most recent entry - representative = cat_entries[-1].description + # Representative description: latest by date (expects ISO-8601 strings) + representative = max(cat_entries, key=lambda e: e.date).description @@ - sources = list({e.source for e in cat_entries}) + sources = sorted({e.source for e in cat_entries})

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/enhancements/experimental/rules_distillation.py` around lines 88 - 99, The code picks a representative description using cat_entries[-1] and builds sources via an unordered set, which is non-deterministic; update the logic to deterministically choose the most recent entry (e.g., select the entry with the max timestamp/created_at field using max(cat_entries, key=lambda e: e.timestamp or e.created_at)) and set representative = that_entry.description, and replace sources = list({e.source for e in cat_entries}) with a sorted deterministic list (e.g., sources = sorted({e.source for e in cat_entries}) or sorted({e.source for e in cat_entries}, key=some_stable_key)) so both representative selection and evidence_sources are stable across runs.

coderabbitai · 2026-05-06T22:52:08Z

+        for value in resources.values():
+            if hasattr(value, "template") or isinstance(value, str):
+                return value
+        return self.prompt_template


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat arbitrary string resources as prompt templates.

Line 100-Line 103 can pick unrelated string metadata from resources.values() and use it as the prompt template, producing invalid prompts.

Suggested fix

- for value in resources.values(): - if hasattr(value, "template") or isinstance(value, str): - return value + for key in ("template", "prompt"): + candidate = resources.get(key) + if candidate is not None: + return candidate return self.prompt_template

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py` around lines 100 - 103, The loop in litagent.py currently treats any string from resources.values() as a prompt template (checking isinstance(value, str)) which can pick unrelated metadata; change the condition to only accept real prompt-template objects (e.g., objects exposing a template attribute or instances of the actual PromptTemplate class your project uses) and remove the isinstance(value, str) branch so only values with hasattr(value, "template") or isinstance(value, PromptTemplate) are returned; update imports to reference PromptTemplate (or the concrete template class used) and leave fallback to self.prompt_template unchanged.

coderabbitai · 2026-05-06T22:52:08Z

+            except Exception as exc:
+                logger.debug("PromptTemplate.format failed, using raw template: %s", exc)
+                return str(prompt_template.template)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escalate prompt-render fallback failures to warning with traceback.

Line 109-Line 111 currently hides prompt template errors behind debug-only logging, which makes rollout faults hard to diagnose.

Suggested fix

- except Exception as exc: - logger.debug("PromptTemplate.format failed, using raw template: %s", exc) + except Exception: + logger.warning( + "PromptTemplate.format failed, using raw template", + exc_info=True, + ) return str(prompt_template.template)

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py` around lines 109 - 111, The current except block in litagent.py that catches formatting failures for PromptTemplate (around the PromptTemplate.format call which returns str(prompt_template.template)) only logs at debug level and hides tracebacks; change the handler in the except Exception as exc within the function/method that uses prompt_template.template to call logger.warning with a descriptive message and include exc_info=True (to emit the traceback), keeping the fallback return of str(prompt_template.template); do not suppress the exception or remove the existing typed Exception catch.

coderabbitai · 2026-05-06T22:52:08Z

+    try:
+        results = brain.search(query, mode="events", top_k=20)
+    except Exception as exc:
+        logger.debug("brain.search failed during reward lookup: %s", exc)
+        results = []
+
+    for result in results or []:
+        event = _event_from_search_result(result)
+        if event and event.get("type") == "CORRECTION":
+            events.append(event)
+
+    try:
+        history = brain.query_events(event_type="CORRECTION", limit=200)
+    except Exception as exc:
+        logger.debug("brain.query_events failed during reward lookup: %s", exc)
+        history = []


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use warning logs with traceback for reward lookup failures.

Line 59 and Line 70 currently swallow lookup failures with debug-only logs, which can mask reward quality regressions in production.

Suggested fix

- except Exception as exc: - logger.debug("brain.search failed during reward lookup: %s", exc) + except Exception: + logger.warning("brain.search failed during reward lookup", exc_info=True) results = [] @@ - except Exception as exc: - logger.debug("brain.query_events failed during reward lookup: %s", exc) + except Exception: + logger.warning("brain.query_events failed during reward lookup", exc_info=True) history = []

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

results = brain.search(query, mode="events", top_k=20)

except Exception as exc:

logger.debug("brain.search failed during reward lookup: %s", exc)

results = []

for result in results or []:

event = _event_from_search_result(result)

if event and event.get("type") == "CORRECTION":

events.append(event)

try:

history = brain.query_events(event_type="CORRECTION", limit=200)

except Exception as exc:

logger.debug("brain.query_events failed during reward lookup: %s", exc)

history = []

try:

results = brain.search(query, mode="events", top_k=20)

except Exception:

logger.warning("brain.search failed during reward lookup", exc_info=True)

results = []

for result in results or []:

event = _event_from_search_result(result)

if event and event.get("type") == "CORRECTION":

events.append(event)

try:

history = brain.query_events(event_type="CORRECTION", limit=200)

except Exception:

logger.warning("brain.query_events failed during reward lookup", exc_info=True)

history = []

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/tuning/agent_lightning/reward.py` around lines 57 - 72, The current except blocks for brain.search and brain.query_events suppress errors with logger.debug and lose traceback; update the except handlers in reward lookup (around brain.search and brain.query_events in reward.py) to log at warning level and include the exception traceback (pass exc_info=True or use logger.warning(..., exc_info=exc)) while preserving existing fallback behavior (setting results = [] and history = []), so failures are visible in production for functions involving brain.search, brain.query_events, and subsequent processing like _event_from_search_result.

coderabbitai · 2026-05-06T22:52:08Z

+            data = event.get("data") if isinstance(event.get("data"), dict) else {}
+            draft = _first_text(data, ("draft_text", "draft", "task_input", "input"))
+            final = _first_text(data, ("final_text", "final", "expected", "correction"))
+            if not draft or not final:
+                continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Support data_json fallback when extracting correction dataset.

Line 148-Line 152 ignores events that store correction payload in data_json, which can silently drop valid training rows and undercut APO tuning.

Suggested fix

- data = event.get("data") if isinstance(event.get("data"), dict) else {} + data = event.get("data") if isinstance(event.get("data"), dict) else {} + if not data: + raw = event.get("data_json") + if isinstance(raw, str): + try: + parsed = json.loads(raw) + if isinstance(parsed, dict): + data = parsed + except json.JSONDecodeError: + pass draft = _first_text(data, ("draft_text", "draft", "task_input", "input")) final = _first_text(data, ("final_text", "final", "expected", "correction"))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/tuning/agent_lightning/runner.py` around lines 148 - 152, The code currently only looks at event.get("data") for draft/final extraction and skips events that store payload in data_json; update the extraction to also check event.get("data_json") as a fallback: if event.get("data") is not a dict, attempt to use event.get("data_json") (if it's a dict use it directly; if it's a JSON string parse it with json.loads) before calling _first_text for draft_text/draft/task_input/input and final_text/final/expected/correction. Ensure you still skip when neither source yields a draft or final and keep using the existing _first_text calls and variable names (data, draft, final).

coderabbitai · 2026-05-06T22:52:08Z

+    except Exception as exc:
+        logger.debug("APO did not expose a best prompt, using seed prompt: %s", exc)
+        return fallback


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use warning+traceback for best-prompt fallback exceptions.

Line 206-Line 208 currently logs best-prompt retrieval failures at debug level only, which can hide optimizer/runtime breakages.

Suggested fix

- except Exception as exc: - logger.debug("APO did not expose a best prompt, using seed prompt: %s", exc) + except Exception: + logger.warning( + "APO did not expose a best prompt, using seed prompt", + exc_info=True, + ) return fallback

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/tuning/agent_lightning/runner.py` around lines 206 - 208, The except block that currently does "except Exception as exc" then logs at debug and returns fallback should instead log a warning including the traceback so failures aren't silently hidden; update the handler around the APO best-prompt retrieval to call logger.warning with a clear message and exc_info=True (keeping the "except Exception as exc" and the subsequent "return fallback") so the exception and stacktrace for the best-prompt fallback are recorded (refer to symbols APO, fallback, and logger in runner.py).

coderabbitai · 2026-05-06T22:52:08Z

+    expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text(
+        encoding="utf-8"
+    ).rstrip("\n")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make fixture path independent from current working directory.

Using Path("tests/...") can fail when tests are run outside repo-root CWD. Resolve from __file__ instead.

Suggested fix

- expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text( + expected = (Path(__file__).resolve().parent / "fixtures" / "synthesize_prompt.golden.txt").read_text( encoding="utf-8" ).rstrip("\n")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text(

encoding="utf-8"

).rstrip("\n")

expected = (Path(__file__).resolve().parent / "fixtures" / "synthesize_prompt.golden.txt").read_text(

encoding="utf-8"

).rstrip("\n")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/tests/test_llm_synthesizer.py` around lines 129 - 131, The test currently builds the fixture path with Path("tests/...") which breaks when CWD isn't the repo root; update the code that sets expected (in test_llm_synthesizer.py) to resolve the fixture relative to the test file by using Path(__file__).parent (or appropriate ancestor) joined with "fixtures/synthesize_prompt.golden.txt" before calling read_text so the fixture lookup is independent of the current working directory.

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

coderabbitai Bot added feature refactor labels May 6, 2026

coderabbitai Bot requested changes May 6, 2026

View reviewed changes

Gradata merged commit 7cfb34d into main May 6, 2026
7 of 9 checks passed

Gradata deleted the chore/cleanup-bcd branch May 6, 2026 22:52

Conversation

Gradata commented May 6, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 6, 2026 •

edited

Loading