Skip to content

chore: cleanup PRs B+C+D from CLEANUP_ROADMAP#182

Merged
Gradata merged 1 commit into
mainfrom
chore/cleanup-bcd
May 6, 2026
Merged

chore: cleanup PRs B+C+D from CLEANUP_ROADMAP#182
Gradata merged 1 commit into
mainfrom
chore/cleanup-bcd

Conversation

@Gradata

@Gradata Gradata commented May 6, 2026

Copy link
Copy Markdown
Owner

Consolidates the remaining cleanup work from CLEANUP_ROADMAP.md. Follows up on PR #179 (Cleanup PR A).

PR B (Section 1+2):

  • Delete graduation/scoring.py + tests (alternate scoring spine, no production callers)
  • Move judgment_decay and rules_distillation to enhancements/experimental/ with back-compat shims
  • Factor shared correction-rate helper at _correction_metrics.py; both _manifest_metrics and scoring/correction_tracking call it

PR C (Section 3):

  • Delete rule_synthesizer.py + tests (no production callers)
  • Update meta_rules docstring to point at llm_synthesizer (the actual code path)
  • Add golden prompt fixture for llm_synthesizer._build_prompt

PR D (Section 6):

  • Rename integrations/agent_lightning to tuning/agent_lightning
  • Back-compat shim at old path with DeprecationWarning
  • Update cli.py, examples/tune_one_prompt.py, and tests to import from tuning/

Tests:

  • 4181 passed, 11 skipped (codex sandbox blocked daemon socket bind)
  • Full suite will run on CI
  • Layering check: no Layer 0 to 2 imports introduced

Risk: moderate. All renames have shims, deletes verified by rg showing zero production callers.

Generated by codex/gpt-5.5 worker (proc_77cd9c62831a). Recovery patch landed via parent agent.

PR B (Section 1 + Section 2):
- DELETE graduation/scoring.py + tests/test_graduation_scoring.py (alternate
  scoring spine, no production callers)
- MOVE judgment_decay → enhancements/experimental/judgment_decay (with shim)
- MOVE rules_distillation → enhancements/experimental/rules_distillation (with shim)
- ADD shared correction-rate helper at src/gradata/_correction_metrics.py;
  _manifest_metrics + scoring/correction_tracking now call it (no formula
  duplication)

PR C (Section 3):
- DELETE rule_synthesizer.py + test_rule_synthesizer.py (no production
  callers — meta_rules docstring referenced it but actual code path is
  llm_synthesizer)
- ADD golden prompt fixture for llm_synthesizer._build_prompt
- UPDATE meta_rules docstring to point at llm_synthesizer (the live path)

PR D (Section 6):
- RENAME integrations/agent_lightning → tuning/agent_lightning
- ADD back-compat shim at integrations/agent_lightning with DeprecationWarning
- UPDATE cli.py + examples/tune_one_prompt.py + tests to import from tuning/
- LEAVE integrations/session_history.py shim alone (already handled in PR #179)

Validation:
- pytest -x -k 'not daemon_extended and not plugin_integration': 4181 passed,
  11 skipped, 13 deselected
- Full pytest -x blocked by sandbox socket bind in test_daemon_extended; will
  run on CI

Layering check: no Layer 0 → 2 imports introduced. All renames have shims.

Generated by codex/gpt-5.5 worker (proc_77cd9c62831a). Author: Oliver Le.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR restructures Agent-Lightning integration from gradata.integrations to a new gradata.tuning namespace, extracts shared correction-metric utilities, moves judgment-decay and rules-distillation implementations to an experimental module with deprecated shims in graduation, adds a new graduation scoring engine, and refactors LLM/session-close logic.

Changes

Core Refactor: Namespace Reorganization and Utility Extraction

Layer / File(s) Summary
Shared Utilities
src/gradata/_correction_metrics.py
New module exports correction_rate() helper for centralized correction-rate calculations with optional rounding.
Tuning Module Scaffold
src/gradata/tuning/__init__.py, src/gradata/tuning/agent_lightning/__init__.py
New gradata.tuning package with lazy-loading __getattr__ entry point for Agent-Lightning bridge.
Agent-Lightning Bridge (New)
src/gradata/tuning/agent_lightning/litagent.py, reward.py, runner.py
Full implementations of GradataLitAgent wrapper, gradata_reward scoring, and run_apo_tune CLI runner relocated from integrations module.
Experimental Enhancements
src/gradata/enhancements/experimental/__init__.py, judgment_decay.py, rules_distillation.py
New experimental module containing pure-computation implementations of judgment-decay (with session-type awareness, decay logic) and rules-distillation (grouping, coverage checking, proposal formatting).
Deprecated Shims
src/gradata/integrations/agent_lightning/, src/gradata/enhancements/graduation/judgment_decay.py, rules_distillation.py
Old integration and graduation modules converted to deprecation-warning shims that re-export from new locations (tuning and experimental respectively).
Graduation Scoring Engine
src/gradata/enhancements/graduation/scoring.py
New opt-in scoring module with blended component calculation (confidence, fire saturation, reliability veto, recency decay, maturity), cut-point-based state transitions, and should_graduate_lesson wrapper.
LLM & Session Utilities
src/gradata/enhancements/llm_synthesizer.py
Extracted _build_prompt() helper to centralize LLM prompt template construction.
Session Hook Simplification
src/gradata/hooks/session_close.py
Removed _refresh_brain_prompt functionality; graduation/pipeline waterfall now gated only by _has_new_triggers.
Correction Tracking Integration
src/gradata/enhancements/scoring/correction_tracking.py, src/gradata/_manifest_metrics.py
Updated to use shared correction_rate() helper; no behavioral changes to correction-profile or manifest-metric computation.
Import Updates & CLI
src/gradata/cli.py, examples/tune_one_prompt.py, src/gradata/integrations/__init__.py, src/gradata/enhancements/meta_rules.py, llm_provider.py
Updated run_apo_tune imports to use gradata.tuning.agent_lightning; added/clarified deprecation documentation.
Tests & Fixtures
tests/test_agent_lightning_bridge.py, test_graduation_scoring.py, test_llm_synthesizer.py, test_rule_synthesizer.py, fixtures/synthesize_prompt.golden.txt
New comprehensive test coverage for graduation scoring logic, LLM prompt golden fixture, rule synthesizer fail-safe contracts; updated Agent-Lightning bridge tests to import from new module location.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Gradata/gradata#106: Modifies session_close hook with trigger-gating and .last_close_ts stamping logic.
  • Gradata/gradata#172: Refactors Agent-Lightning bridge and tuning CLI integration.
  • Gradata/gradata#93: Earlier merge that introduced tuning.agent_lightning, judgment_decay, rules_distillation, and related enhancements.

Suggested labels

feature, refactor

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'chore: cleanup PRs B+C+D from CLEANUP_ROADMAP' clearly and concisely summarizes the main change: consolidating cleanup work from a roadmap across multiple sections.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the cleanup work being consolidated, including specific sections (B, C, D), affected files, and validation results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/cleanup-bcd

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.20.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.27][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/enhancements/experimental/judgment_decay.py`:
- Around line 101-104: The membership check on session_type can silently skip
decay for typos/unexpected values; before calling session_type.lower() and
looking up CATEGORY_SESSION_TYPES, validate and normalize session_type (ensure
it's a str, strip/normalize casing) and explicitly guard unknown values: if the
normalized session_type is not present in CATEGORY_SESSION_TYPES keys and not in
ALL_SESSION_TYPES, return False (or log a warning) so unknown/typo values don't
make categories unintentionally non-testable; reference the session_type
variable, CATEGORY_SESSION_TYPES and ALL_SESSION_TYPES when applying this guard.

In `@Gradata/src/gradata/enhancements/experimental/rules_distillation.py`:
- Around line 88-99: The code picks a representative description using
cat_entries[-1] and builds sources via an unordered set, which is
non-deterministic; update the logic to deterministically choose the most recent
entry (e.g., select the entry with the max timestamp/created_at field using
max(cat_entries, key=lambda e: e.timestamp or e.created_at)) and set
representative = that_entry.description, and replace sources = list({e.source
for e in cat_entries}) with a sorted deterministic list (e.g., sources =
sorted({e.source for e in cat_entries}) or sorted({e.source for e in
cat_entries}, key=some_stable_key)) so both representative selection and
evidence_sources are stable across runs.

In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py`:
- Around line 109-111: The current except block in litagent.py that catches
formatting failures for PromptTemplate (around the PromptTemplate.format call
which returns str(prompt_template.template)) only logs at debug level and hides
tracebacks; change the handler in the except Exception as exc within the
function/method that uses prompt_template.template to call logger.warning with a
descriptive message and include exc_info=True (to emit the traceback), keeping
the fallback return of str(prompt_template.template); do not suppress the
exception or remove the existing typed Exception catch.
- Around line 100-103: The loop in litagent.py currently treats any string from
resources.values() as a prompt template (checking isinstance(value, str)) which
can pick unrelated metadata; change the condition to only accept real
prompt-template objects (e.g., objects exposing a template attribute or
instances of the actual PromptTemplate class your project uses) and remove the
isinstance(value, str) branch so only values with hasattr(value, "template") or
isinstance(value, PromptTemplate) are returned; update imports to reference
PromptTemplate (or the concrete template class used) and leave fallback to
self.prompt_template unchanged.

In `@Gradata/src/gradata/tuning/agent_lightning/reward.py`:
- Around line 57-72: The current except blocks for brain.search and
brain.query_events suppress errors with logger.debug and lose traceback; update
the except handlers in reward lookup (around brain.search and brain.query_events
in reward.py) to log at warning level and include the exception traceback (pass
exc_info=True or use logger.warning(..., exc_info=exc)) while preserving
existing fallback behavior (setting results = [] and history = []), so failures
are visible in production for functions involving brain.search,
brain.query_events, and subsequent processing like _event_from_search_result.

In `@Gradata/src/gradata/tuning/agent_lightning/runner.py`:
- Around line 206-208: The except block that currently does "except Exception as
exc" then logs at debug and returns fallback should instead log a warning
including the traceback so failures aren't silently hidden; update the handler
around the APO best-prompt retrieval to call logger.warning with a clear message
and exc_info=True (keeping the "except Exception as exc" and the subsequent
"return fallback") so the exception and stacktrace for the best-prompt fallback
are recorded (refer to symbols APO, fallback, and logger in runner.py).
- Around line 148-152: The code currently only looks at event.get("data") for
draft/final extraction and skips events that store payload in data_json; update
the extraction to also check event.get("data_json") as a fallback: if
event.get("data") is not a dict, attempt to use event.get("data_json") (if it's
a dict use it directly; if it's a JSON string parse it with json.loads) before
calling _first_text for draft_text/draft/task_input/input and
final_text/final/expected/correction. Ensure you still skip when neither source
yields a draft or final and keep using the existing _first_text calls and
variable names (data, draft, final).

In `@Gradata/tests/test_llm_synthesizer.py`:
- Around line 129-131: The test currently builds the fixture path with
Path("tests/...") which breaks when CWD isn't the repo root; update the code
that sets expected (in test_llm_synthesizer.py) to resolve the fixture relative
to the test file by using Path(__file__).parent (or appropriate ancestor) joined
with "fixtures/synthesize_prompt.golden.txt" before calling read_text so the
fixture lookup is independent of the current working directory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ef933b46-7a6e-49c0-b452-68cc6820f840

📥 Commits

Reviewing files that changed from the base of the PR and between 86c972f and f5cf6fc.

📒 Files selected for processing (31)
  • Gradata/examples/tune_one_prompt.py
  • Gradata/src/gradata/_correction_metrics.py
  • Gradata/src/gradata/_manifest_metrics.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/enhancements/experimental/__init__.py
  • Gradata/src/gradata/enhancements/experimental/judgment_decay.py
  • Gradata/src/gradata/enhancements/experimental/rules_distillation.py
  • Gradata/src/gradata/enhancements/graduation/judgment_decay.py
  • Gradata/src/gradata/enhancements/graduation/rules_distillation.py
  • Gradata/src/gradata/enhancements/graduation/scoring.py
  • Gradata/src/gradata/enhancements/llm_provider.py
  • Gradata/src/gradata/enhancements/llm_synthesizer.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/enhancements/rule_synthesizer.py
  • Gradata/src/gradata/enhancements/scoring/correction_tracking.py
  • Gradata/src/gradata/hooks/session_close.py
  • Gradata/src/gradata/integrations/__init__.py
  • Gradata/src/gradata/integrations/agent_lightning/__init__.py
  • Gradata/src/gradata/integrations/agent_lightning/litagent.py
  • Gradata/src/gradata/integrations/agent_lightning/reward.py
  • Gradata/src/gradata/integrations/agent_lightning/runner.py
  • Gradata/src/gradata/tuning/__init__.py
  • Gradata/src/gradata/tuning/agent_lightning/__init__.py
  • Gradata/src/gradata/tuning/agent_lightning/litagent.py
  • Gradata/src/gradata/tuning/agent_lightning/reward.py
  • Gradata/src/gradata/tuning/agent_lightning/runner.py
  • Gradata/tests/fixtures/synthesize_prompt.golden.txt
  • Gradata/tests/test_agent_lightning_bridge.py
  • Gradata/tests/test_graduation_scoring.py
  • Gradata/tests/test_llm_synthesizer.py
  • Gradata/tests/test_rule_synthesizer.py
💤 Files with no reviewable changes (5)
  • Gradata/src/gradata/hooks/session_close.py
  • Gradata/tests/test_rule_synthesizer.py
  • Gradata/src/gradata/enhancements/rule_synthesizer.py
  • Gradata/tests/test_graduation_scoring.py
  • Gradata/src/gradata/enhancements/graduation/scoring.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/enhancements/experimental/__init__.py
  • Gradata/src/gradata/tuning/__init__.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/_correction_metrics.py
  • Gradata/src/gradata/integrations/__init__.py
  • Gradata/src/gradata/enhancements/scoring/correction_tracking.py
  • Gradata/src/gradata/_manifest_metrics.py
  • Gradata/src/gradata/integrations/agent_lightning/runner.py
  • Gradata/src/gradata/enhancements/llm_synthesizer.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/enhancements/llm_provider.py
  • Gradata/src/gradata/tuning/agent_lightning/__init__.py
  • Gradata/src/gradata/integrations/agent_lightning/reward.py
  • Gradata/src/gradata/enhancements/graduation/judgment_decay.py
  • Gradata/src/gradata/enhancements/graduation/rules_distillation.py
  • Gradata/src/gradata/integrations/agent_lightning/__init__.py
  • Gradata/src/gradata/integrations/agent_lightning/litagent.py
  • Gradata/src/gradata/enhancements/experimental/rules_distillation.py
  • Gradata/src/gradata/tuning/agent_lightning/reward.py
  • Gradata/src/gradata/tuning/agent_lightning/litagent.py
  • Gradata/src/gradata/tuning/agent_lightning/runner.py
  • Gradata/src/gradata/enhancements/experimental/judgment_decay.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_agent_lightning_bridge.py
  • Gradata/tests/test_llm_synthesizer.py
🔇 Additional comments (23)
Gradata/src/gradata/enhancements/llm_provider.py (1)

239-239: Docstring clarification is accurate and low-risk.

The updated wording keeps behavior description clear without changing execution semantics.

Gradata/src/gradata/enhancements/meta_rules.py (2)

25-27: Reference updates now match the actual synthesis path.

Good alignment to llm_synthesizer and current provider-based flow.

Also applies to: 64-65


410-413: Deterministic-vs-LLM responsibility boundary is clearly documented.

This makes the merge_into_meta contract easier to reason about for callers and tests.

Gradata/tests/fixtures/synthesize_prompt.golden.txt (1)

1-9: Golden prompt fixture is crisp and contract-focused.

This is a strong deterministic baseline for template-regression tests.

Gradata/tests/test_llm_synthesizer.py (1)

122-128: Great addition of a deterministic prompt-contract test.

Locking _build_prompt output to a golden fixture materially reduces accidental prompt drift.

Also applies to: 132-133, 136-136

Gradata/src/gradata/enhancements/llm_synthesizer.py (2)

96-96: Prompt-builder extraction at the call site is a clean refactor.

This keeps synthesis flow unchanged while making prompt behavior testable in isolation.


155-165: _build_prompt centralizes template logic effectively.

Nice cohesion improvement, and it pairs well with the new golden-fixture test.

Gradata/src/gradata/enhancements/experimental/__init__.py (1)

1-1: Docstring is clear and scoped correctly.
Good lightweight module marker for experimental-only wiring.

Gradata/src/gradata/tuning/__init__.py (1)

1-1: Namespace description matches the new package role.
Looks good.

Gradata/src/gradata/integrations/__init__.py (1)

14-16: Deprecation forwarding note is accurate and explicit.
The updated target paths/readability are solid.

Gradata/src/gradata/_correction_metrics.py (1)

10-24: Shared ratio helper is clean and predictable.
Centralizing this logic reduces drift across metric call sites.

Gradata/src/gradata/enhancements/graduation/judgment_decay.py (1)

7-15: Shim behavior is correct for staged migration.
Deprecation warning + forwarding import aligns with the compatibility plan.

Gradata/examples/tune_one_prompt.py (1)

7-7: Example import was updated to the new tuning namespace correctly.
Looks consistent with the deprecation path.

Gradata/src/gradata/cli.py (1)

510-510: CLI tuning import migration is correct.
This points cmd_tune at the canonical gradata.tuning path.

Gradata/src/gradata/enhancements/scoring/correction_tracking.py (1)

19-20: Shared correction-rate helper adoption looks good.

This keeps correction-rate behavior centralized and consistent with the metrics layer.

Also applies to: 373-373

Gradata/tests/test_agent_lightning_bridge.py (1)

20-20: Import path migration in tests is correct.

The test now validates the new gradata.tuning.agent_lightning surface directly.

Gradata/src/gradata/_manifest_metrics.py (1)

31-31: Correction-rate normalization is consistently applied.

Using the shared helper in both trend and quality paths is a solid cleanup and reduces drift risk.

Also applies to: 105-105, 355-360

Gradata/src/gradata/integrations/agent_lightning/__init__.py (1)

1-1: Deprecation shim is implemented cleanly.

Lazy forwarding plus preserved __all__ keeps backward compatibility while steering callers to the new module.

Also applies to: 8-13, 16-16, 27-37

Gradata/src/gradata/integrations/agent_lightning/runner.py (1)

1-1: Runner shim migration path looks good.

Deprecation warning + re-export preserves compatibility with minimal risk.

Also applies to: 7-12, 14-14

Gradata/src/gradata/tuning/agent_lightning/__init__.py (1)

15-24: New tuning package export surface is solid.

The lazy __getattr__ pattern and __all__ declaration provide a clean, stable API boundary.

Also applies to: 27-40

Gradata/src/gradata/enhancements/graduation/rules_distillation.py (1)

7-12: Rules-distillation compatibility shim is well-structured.

Deprecation notice and explicit re-exported API make this transition straightforward for downstream imports.

Also applies to: 14-20, 22-28

Gradata/src/gradata/integrations/agent_lightning/reward.py (1)

7-12: Reward shim migration is correctly implemented.

The deprecated path still works and cleanly forwards to the new tuning namespace.

Also applies to: 14-14

Gradata/src/gradata/integrations/agent_lightning/litagent.py (1)

7-14: Deprecation shim looks correct and consistent.

The warning + re-export behavior is clear and keeps backward compatibility intact.

Comment on lines +101 to +104
if session_type is None:
return True # backward compat: no filtering
testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES)
return session_type.lower() in testable_types

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard unknown session_type values before membership check.

At Line 104, an unrecognized session_type (typo/new value) makes categories effectively non-testable, so decay gets skipped silently across the batch. Add an explicit guard/fallback before the lookup check.

Proposed fix
 def is_category_testable(category: str, session_type: str | None) -> bool:
@@
     if session_type is None:
         return True  # backward compat: no filtering
-    testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES)
-    return session_type.lower() in testable_types
+    normalized = session_type.lower()
+    if normalized not in ALL_SESSION_TYPES:
+        return True  # backward compat for unknown/new session types
+    testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES)
+    return normalized in testable_types
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/experimental/judgment_decay.py` around lines
101 - 104, The membership check on session_type can silently skip decay for
typos/unexpected values; before calling session_type.lower() and looking up
CATEGORY_SESSION_TYPES, validate and normalize session_type (ensure it's a str,
strip/normalize casing) and explicitly guard unknown values: if the normalized
session_type is not present in CATEGORY_SESSION_TYPES keys and not in
ALL_SESSION_TYPES, return False (or log a warning) so unknown/typo values don't
make categories unintentionally non-testable; reference the session_type
variable, CATEGORY_SESSION_TYPES and ALL_SESSION_TYPES when applying this guard.

Comment on lines +88 to +99
# Representative description: most recent entry
representative = cat_entries[-1].description

# Check if already covered by existing rules
covered_by = _check_coverage(
" ".join(e.description for e in cat_entries),
existing_rules,
)

sources = list({e.source for e in cat_entries})
statuses = Counter(e.status for e in cat_entries)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Representative/source selection is order-unstable.

Line 89 assumes input order equals recency, and Line 97 builds evidence_sources from a set (non-deterministic order). This can produce inconsistent proposals/output.

Suggested fix
-        # Representative description: most recent entry
-        representative = cat_entries[-1].description
+        # Representative description: latest by date (expects ISO-8601 strings)
+        representative = max(cat_entries, key=lambda e: e.date).description
@@
-        sources = list({e.source for e in cat_entries})
+        sources = sorted({e.source for e in cat_entries})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/experimental/rules_distillation.py` around
lines 88 - 99, The code picks a representative description using cat_entries[-1]
and builds sources via an unordered set, which is non-deterministic; update the
logic to deterministically choose the most recent entry (e.g., select the entry
with the max timestamp/created_at field using max(cat_entries, key=lambda e:
e.timestamp or e.created_at)) and set representative = that_entry.description,
and replace sources = list({e.source for e in cat_entries}) with a sorted
deterministic list (e.g., sources = sorted({e.source for e in cat_entries}) or
sorted({e.source for e in cat_entries}, key=some_stable_key)) so both
representative selection and evidence_sources are stable across runs.

Comment on lines +100 to +103
for value in resources.values():
if hasattr(value, "template") or isinstance(value, str):
return value
return self.prompt_template

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat arbitrary string resources as prompt templates.

Line 100-Line 103 can pick unrelated string metadata from resources.values() and use it as the prompt template, producing invalid prompts.

Suggested fix
-        for value in resources.values():
-            if hasattr(value, "template") or isinstance(value, str):
-                return value
+        for key in ("template", "prompt"):
+            candidate = resources.get(key)
+            if candidate is not None:
+                return candidate
         return self.prompt_template
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py` around lines 100 -
103, The loop in litagent.py currently treats any string from resources.values()
as a prompt template (checking isinstance(value, str)) which can pick unrelated
metadata; change the condition to only accept real prompt-template objects
(e.g., objects exposing a template attribute or instances of the actual
PromptTemplate class your project uses) and remove the isinstance(value, str)
branch so only values with hasattr(value, "template") or isinstance(value,
PromptTemplate) are returned; update imports to reference PromptTemplate (or the
concrete template class used) and leave fallback to self.prompt_template
unchanged.

Comment on lines +109 to +111
except Exception as exc:
logger.debug("PromptTemplate.format failed, using raw template: %s", exc)
return str(prompt_template.template)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escalate prompt-render fallback failures to warning with traceback.

Line 109-Line 111 currently hides prompt template errors behind debug-only logging, which makes rollout faults hard to diagnose.

Suggested fix
-            except Exception as exc:
-                logger.debug("PromptTemplate.format failed, using raw template: %s", exc)
+            except Exception:
+                logger.warning(
+                    "PromptTemplate.format failed, using raw template",
+                    exc_info=True,
+                )
                 return str(prompt_template.template)

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py` around lines 109 -
111, The current except block in litagent.py that catches formatting failures
for PromptTemplate (around the PromptTemplate.format call which returns
str(prompt_template.template)) only logs at debug level and hides tracebacks;
change the handler in the except Exception as exc within the function/method
that uses prompt_template.template to call logger.warning with a descriptive
message and include exc_info=True (to emit the traceback), keeping the fallback
return of str(prompt_template.template); do not suppress the exception or remove
the existing typed Exception catch.

Comment on lines +57 to +72
try:
results = brain.search(query, mode="events", top_k=20)
except Exception as exc:
logger.debug("brain.search failed during reward lookup: %s", exc)
results = []

for result in results or []:
event = _event_from_search_result(result)
if event and event.get("type") == "CORRECTION":
events.append(event)

try:
history = brain.query_events(event_type="CORRECTION", limit=200)
except Exception as exc:
logger.debug("brain.query_events failed during reward lookup: %s", exc)
history = []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use warning logs with traceback for reward lookup failures.

Line 59 and Line 70 currently swallow lookup failures with debug-only logs, which can mask reward quality regressions in production.

Suggested fix
-    except Exception as exc:
-        logger.debug("brain.search failed during reward lookup: %s", exc)
+    except Exception:
+        logger.warning("brain.search failed during reward lookup", exc_info=True)
         results = []
@@
-    except Exception as exc:
-        logger.debug("brain.query_events failed during reward lookup: %s", exc)
+    except Exception:
+        logger.warning("brain.query_events failed during reward lookup", exc_info=True)
         history = []

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
results = brain.search(query, mode="events", top_k=20)
except Exception as exc:
logger.debug("brain.search failed during reward lookup: %s", exc)
results = []
for result in results or []:
event = _event_from_search_result(result)
if event and event.get("type") == "CORRECTION":
events.append(event)
try:
history = brain.query_events(event_type="CORRECTION", limit=200)
except Exception as exc:
logger.debug("brain.query_events failed during reward lookup: %s", exc)
history = []
try:
results = brain.search(query, mode="events", top_k=20)
except Exception:
logger.warning("brain.search failed during reward lookup", exc_info=True)
results = []
for result in results or []:
event = _event_from_search_result(result)
if event and event.get("type") == "CORRECTION":
events.append(event)
try:
history = brain.query_events(event_type="CORRECTION", limit=200)
except Exception:
logger.warning("brain.query_events failed during reward lookup", exc_info=True)
history = []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/tuning/agent_lightning/reward.py` around lines 57 - 72,
The current except blocks for brain.search and brain.query_events suppress
errors with logger.debug and lose traceback; update the except handlers in
reward lookup (around brain.search and brain.query_events in reward.py) to log
at warning level and include the exception traceback (pass exc_info=True or use
logger.warning(..., exc_info=exc)) while preserving existing fallback behavior
(setting results = [] and history = []), so failures are visible in production
for functions involving brain.search, brain.query_events, and subsequent
processing like _event_from_search_result.

Comment on lines +148 to +152
data = event.get("data") if isinstance(event.get("data"), dict) else {}
draft = _first_text(data, ("draft_text", "draft", "task_input", "input"))
final = _first_text(data, ("final_text", "final", "expected", "correction"))
if not draft or not final:
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Support data_json fallback when extracting correction dataset.

Line 148-Line 152 ignores events that store correction payload in data_json, which can silently drop valid training rows and undercut APO tuning.

Suggested fix
-            data = event.get("data") if isinstance(event.get("data"), dict) else {}
+            data = event.get("data") if isinstance(event.get("data"), dict) else {}
+            if not data:
+                raw = event.get("data_json")
+                if isinstance(raw, str):
+                    try:
+                        parsed = json.loads(raw)
+                        if isinstance(parsed, dict):
+                            data = parsed
+                    except json.JSONDecodeError:
+                        pass
             draft = _first_text(data, ("draft_text", "draft", "task_input", "input"))
             final = _first_text(data, ("final_text", "final", "expected", "correction"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/tuning/agent_lightning/runner.py` around lines 148 - 152,
The code currently only looks at event.get("data") for draft/final extraction
and skips events that store payload in data_json; update the extraction to also
check event.get("data_json") as a fallback: if event.get("data") is not a dict,
attempt to use event.get("data_json") (if it's a dict use it directly; if it's a
JSON string parse it with json.loads) before calling _first_text for
draft_text/draft/task_input/input and final_text/final/expected/correction.
Ensure you still skip when neither source yields a draft or final and keep using
the existing _first_text calls and variable names (data, draft, final).

Comment on lines +206 to +208
except Exception as exc:
logger.debug("APO did not expose a best prompt, using seed prompt: %s", exc)
return fallback

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use warning+traceback for best-prompt fallback exceptions.

Line 206-Line 208 currently logs best-prompt retrieval failures at debug level only, which can hide optimizer/runtime breakages.

Suggested fix
-    except Exception as exc:
-        logger.debug("APO did not expose a best prompt, using seed prompt: %s", exc)
+    except Exception:
+        logger.warning(
+            "APO did not expose a best prompt, using seed prompt",
+            exc_info=True,
+        )
         return fallback

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/tuning/agent_lightning/runner.py` around lines 206 - 208,
The except block that currently does "except Exception as exc" then logs at
debug and returns fallback should instead log a warning including the traceback
so failures aren't silently hidden; update the handler around the APO
best-prompt retrieval to call logger.warning with a clear message and
exc_info=True (keeping the "except Exception as exc" and the subsequent "return
fallback") so the exception and stacktrace for the best-prompt fallback are
recorded (refer to symbols APO, fallback, and logger in runner.py).

Comment on lines +129 to +131
expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text(
encoding="utf-8"
).rstrip("\n")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make fixture path independent from current working directory.

Using Path("tests/...") can fail when tests are run outside repo-root CWD. Resolve from __file__ instead.

Suggested fix
-    expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text(
+    expected = (Path(__file__).resolve().parent / "fixtures" / "synthesize_prompt.golden.txt").read_text(
         encoding="utf-8"
     ).rstrip("\n")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text(
encoding="utf-8"
).rstrip("\n")
expected = (Path(__file__).resolve().parent / "fixtures" / "synthesize_prompt.golden.txt").read_text(
encoding="utf-8"
).rstrip("\n")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/tests/test_llm_synthesizer.py` around lines 129 - 131, The test
currently builds the fixture path with Path("tests/...") which breaks when CWD
isn't the repo root; update the code that sets expected (in
test_llm_synthesizer.py) to resolve the fixture relative to the test file by
using Path(__file__).parent (or appropriate ancestor) joined with
"fixtures/synthesize_prompt.golden.txt" before calling read_text so the fixture
lookup is independent of the current working directory.

@Gradata Gradata merged commit 7cfb34d into main May 6, 2026
7 of 9 checks passed
@Gradata Gradata deleted the chore/cleanup-bcd branch May 6, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant