chore: cleanup PRs B+C+D from CLEANUP_ROADMAP#182
Conversation
PR B (Section 1 + Section 2): - DELETE graduation/scoring.py + tests/test_graduation_scoring.py (alternate scoring spine, no production callers) - MOVE judgment_decay → enhancements/experimental/judgment_decay (with shim) - MOVE rules_distillation → enhancements/experimental/rules_distillation (with shim) - ADD shared correction-rate helper at src/gradata/_correction_metrics.py; _manifest_metrics + scoring/correction_tracking now call it (no formula duplication) PR C (Section 3): - DELETE rule_synthesizer.py + test_rule_synthesizer.py (no production callers — meta_rules docstring referenced it but actual code path is llm_synthesizer) - ADD golden prompt fixture for llm_synthesizer._build_prompt - UPDATE meta_rules docstring to point at llm_synthesizer (the live path) PR D (Section 6): - RENAME integrations/agent_lightning → tuning/agent_lightning - ADD back-compat shim at integrations/agent_lightning with DeprecationWarning - UPDATE cli.py + examples/tune_one_prompt.py + tests to import from tuning/ - LEAVE integrations/session_history.py shim alone (already handled in PR #179) Validation: - pytest -x -k 'not daemon_extended and not plugin_integration': 4181 passed, 11 skipped, 13 deselected - Full pytest -x blocked by sandbox socket bind in test_daemon_extended; will run on CI Layering check: no Layer 0 → 2 imports introduced. All renames have shims. Generated by codex/gpt-5.5 worker (proc_77cd9c62831a). Author: Oliver Le.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughThis PR restructures Agent-Lightning integration from ChangesCore Refactor: Namespace Reorganization and Utility Extraction
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.20.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/src/gradata/enhancements/experimental/judgment_decay.py`:
- Around line 101-104: The membership check on session_type can silently skip
decay for typos/unexpected values; before calling session_type.lower() and
looking up CATEGORY_SESSION_TYPES, validate and normalize session_type (ensure
it's a str, strip/normalize casing) and explicitly guard unknown values: if the
normalized session_type is not present in CATEGORY_SESSION_TYPES keys and not in
ALL_SESSION_TYPES, return False (or log a warning) so unknown/typo values don't
make categories unintentionally non-testable; reference the session_type
variable, CATEGORY_SESSION_TYPES and ALL_SESSION_TYPES when applying this guard.
In `@Gradata/src/gradata/enhancements/experimental/rules_distillation.py`:
- Around line 88-99: The code picks a representative description using
cat_entries[-1] and builds sources via an unordered set, which is
non-deterministic; update the logic to deterministically choose the most recent
entry (e.g., select the entry with the max timestamp/created_at field using
max(cat_entries, key=lambda e: e.timestamp or e.created_at)) and set
representative = that_entry.description, and replace sources = list({e.source
for e in cat_entries}) with a sorted deterministic list (e.g., sources =
sorted({e.source for e in cat_entries}) or sorted({e.source for e in
cat_entries}, key=some_stable_key)) so both representative selection and
evidence_sources are stable across runs.
In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py`:
- Around line 109-111: The current except block in litagent.py that catches
formatting failures for PromptTemplate (around the PromptTemplate.format call
which returns str(prompt_template.template)) only logs at debug level and hides
tracebacks; change the handler in the except Exception as exc within the
function/method that uses prompt_template.template to call logger.warning with a
descriptive message and include exc_info=True (to emit the traceback), keeping
the fallback return of str(prompt_template.template); do not suppress the
exception or remove the existing typed Exception catch.
- Around line 100-103: The loop in litagent.py currently treats any string from
resources.values() as a prompt template (checking isinstance(value, str)) which
can pick unrelated metadata; change the condition to only accept real
prompt-template objects (e.g., objects exposing a template attribute or
instances of the actual PromptTemplate class your project uses) and remove the
isinstance(value, str) branch so only values with hasattr(value, "template") or
isinstance(value, PromptTemplate) are returned; update imports to reference
PromptTemplate (or the concrete template class used) and leave fallback to
self.prompt_template unchanged.
In `@Gradata/src/gradata/tuning/agent_lightning/reward.py`:
- Around line 57-72: The current except blocks for brain.search and
brain.query_events suppress errors with logger.debug and lose traceback; update
the except handlers in reward lookup (around brain.search and brain.query_events
in reward.py) to log at warning level and include the exception traceback (pass
exc_info=True or use logger.warning(..., exc_info=exc)) while preserving
existing fallback behavior (setting results = [] and history = []), so failures
are visible in production for functions involving brain.search,
brain.query_events, and subsequent processing like _event_from_search_result.
In `@Gradata/src/gradata/tuning/agent_lightning/runner.py`:
- Around line 206-208: The except block that currently does "except Exception as
exc" then logs at debug and returns fallback should instead log a warning
including the traceback so failures aren't silently hidden; update the handler
around the APO best-prompt retrieval to call logger.warning with a clear message
and exc_info=True (keeping the "except Exception as exc" and the subsequent
"return fallback") so the exception and stacktrace for the best-prompt fallback
are recorded (refer to symbols APO, fallback, and logger in runner.py).
- Around line 148-152: The code currently only looks at event.get("data") for
draft/final extraction and skips events that store payload in data_json; update
the extraction to also check event.get("data_json") as a fallback: if
event.get("data") is not a dict, attempt to use event.get("data_json") (if it's
a dict use it directly; if it's a JSON string parse it with json.loads) before
calling _first_text for draft_text/draft/task_input/input and
final_text/final/expected/correction. Ensure you still skip when neither source
yields a draft or final and keep using the existing _first_text calls and
variable names (data, draft, final).
In `@Gradata/tests/test_llm_synthesizer.py`:
- Around line 129-131: The test currently builds the fixture path with
Path("tests/...") which breaks when CWD isn't the repo root; update the code
that sets expected (in test_llm_synthesizer.py) to resolve the fixture relative
to the test file by using Path(__file__).parent (or appropriate ancestor) joined
with "fixtures/synthesize_prompt.golden.txt" before calling read_text so the
fixture lookup is independent of the current working directory.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ef933b46-7a6e-49c0-b452-68cc6820f840
📒 Files selected for processing (31)
Gradata/examples/tune_one_prompt.pyGradata/src/gradata/_correction_metrics.pyGradata/src/gradata/_manifest_metrics.pyGradata/src/gradata/cli.pyGradata/src/gradata/enhancements/experimental/__init__.pyGradata/src/gradata/enhancements/experimental/judgment_decay.pyGradata/src/gradata/enhancements/experimental/rules_distillation.pyGradata/src/gradata/enhancements/graduation/judgment_decay.pyGradata/src/gradata/enhancements/graduation/rules_distillation.pyGradata/src/gradata/enhancements/graduation/scoring.pyGradata/src/gradata/enhancements/llm_provider.pyGradata/src/gradata/enhancements/llm_synthesizer.pyGradata/src/gradata/enhancements/meta_rules.pyGradata/src/gradata/enhancements/rule_synthesizer.pyGradata/src/gradata/enhancements/scoring/correction_tracking.pyGradata/src/gradata/hooks/session_close.pyGradata/src/gradata/integrations/__init__.pyGradata/src/gradata/integrations/agent_lightning/__init__.pyGradata/src/gradata/integrations/agent_lightning/litagent.pyGradata/src/gradata/integrations/agent_lightning/reward.pyGradata/src/gradata/integrations/agent_lightning/runner.pyGradata/src/gradata/tuning/__init__.pyGradata/src/gradata/tuning/agent_lightning/__init__.pyGradata/src/gradata/tuning/agent_lightning/litagent.pyGradata/src/gradata/tuning/agent_lightning/reward.pyGradata/src/gradata/tuning/agent_lightning/runner.pyGradata/tests/fixtures/synthesize_prompt.golden.txtGradata/tests/test_agent_lightning_bridge.pyGradata/tests/test_graduation_scoring.pyGradata/tests/test_llm_synthesizer.pyGradata/tests/test_rule_synthesizer.py
💤 Files with no reviewable changes (5)
- Gradata/src/gradata/hooks/session_close.py
- Gradata/tests/test_rule_synthesizer.py
- Gradata/src/gradata/enhancements/rule_synthesizer.py
- Gradata/tests/test_graduation_scoring.py
- Gradata/src/gradata/enhancements/graduation/scoring.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.11
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/enhancements/experimental/__init__.pyGradata/src/gradata/tuning/__init__.pyGradata/src/gradata/cli.pyGradata/src/gradata/_correction_metrics.pyGradata/src/gradata/integrations/__init__.pyGradata/src/gradata/enhancements/scoring/correction_tracking.pyGradata/src/gradata/_manifest_metrics.pyGradata/src/gradata/integrations/agent_lightning/runner.pyGradata/src/gradata/enhancements/llm_synthesizer.pyGradata/src/gradata/enhancements/meta_rules.pyGradata/src/gradata/enhancements/llm_provider.pyGradata/src/gradata/tuning/agent_lightning/__init__.pyGradata/src/gradata/integrations/agent_lightning/reward.pyGradata/src/gradata/enhancements/graduation/judgment_decay.pyGradata/src/gradata/enhancements/graduation/rules_distillation.pyGradata/src/gradata/integrations/agent_lightning/__init__.pyGradata/src/gradata/integrations/agent_lightning/litagent.pyGradata/src/gradata/enhancements/experimental/rules_distillation.pyGradata/src/gradata/tuning/agent_lightning/reward.pyGradata/src/gradata/tuning/agent_lightning/litagent.pyGradata/src/gradata/tuning/agent_lightning/runner.pyGradata/src/gradata/enhancements/experimental/judgment_decay.py
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_agent_lightning_bridge.pyGradata/tests/test_llm_synthesizer.py
🔇 Additional comments (23)
Gradata/src/gradata/enhancements/llm_provider.py (1)
239-239: Docstring clarification is accurate and low-risk.The updated wording keeps behavior description clear without changing execution semantics.
Gradata/src/gradata/enhancements/meta_rules.py (2)
25-27: Reference updates now match the actual synthesis path.Good alignment to
llm_synthesizerand current provider-based flow.Also applies to: 64-65
410-413: Deterministic-vs-LLM responsibility boundary is clearly documented.This makes the
merge_into_metacontract easier to reason about for callers and tests.Gradata/tests/fixtures/synthesize_prompt.golden.txt (1)
1-9: Golden prompt fixture is crisp and contract-focused.This is a strong deterministic baseline for template-regression tests.
Gradata/tests/test_llm_synthesizer.py (1)
122-128: Great addition of a deterministic prompt-contract test.Locking
_build_promptoutput to a golden fixture materially reduces accidental prompt drift.Also applies to: 132-133, 136-136
Gradata/src/gradata/enhancements/llm_synthesizer.py (2)
96-96: Prompt-builder extraction at the call site is a clean refactor.This keeps synthesis flow unchanged while making prompt behavior testable in isolation.
155-165:_build_promptcentralizes template logic effectively.Nice cohesion improvement, and it pairs well with the new golden-fixture test.
Gradata/src/gradata/enhancements/experimental/__init__.py (1)
1-1: Docstring is clear and scoped correctly.
Good lightweight module marker for experimental-only wiring.Gradata/src/gradata/tuning/__init__.py (1)
1-1: Namespace description matches the new package role.
Looks good.Gradata/src/gradata/integrations/__init__.py (1)
14-16: Deprecation forwarding note is accurate and explicit.
The updated target paths/readability are solid.Gradata/src/gradata/_correction_metrics.py (1)
10-24: Shared ratio helper is clean and predictable.
Centralizing this logic reduces drift across metric call sites.Gradata/src/gradata/enhancements/graduation/judgment_decay.py (1)
7-15: Shim behavior is correct for staged migration.
Deprecation warning + forwarding import aligns with the compatibility plan.Gradata/examples/tune_one_prompt.py (1)
7-7: Example import was updated to the new tuning namespace correctly.
Looks consistent with the deprecation path.Gradata/src/gradata/cli.py (1)
510-510: CLI tuning import migration is correct.
This pointscmd_tuneat the canonicalgradata.tuningpath.Gradata/src/gradata/enhancements/scoring/correction_tracking.py (1)
19-20: Shared correction-rate helper adoption looks good.This keeps correction-rate behavior centralized and consistent with the metrics layer.
Also applies to: 373-373
Gradata/tests/test_agent_lightning_bridge.py (1)
20-20: Import path migration in tests is correct.The test now validates the new
gradata.tuning.agent_lightningsurface directly.Gradata/src/gradata/_manifest_metrics.py (1)
31-31: Correction-rate normalization is consistently applied.Using the shared helper in both trend and quality paths is a solid cleanup and reduces drift risk.
Also applies to: 105-105, 355-360
Gradata/src/gradata/integrations/agent_lightning/__init__.py (1)
1-1: Deprecation shim is implemented cleanly.Lazy forwarding plus preserved
__all__keeps backward compatibility while steering callers to the new module.Also applies to: 8-13, 16-16, 27-37
Gradata/src/gradata/integrations/agent_lightning/runner.py (1)
1-1: Runner shim migration path looks good.Deprecation warning + re-export preserves compatibility with minimal risk.
Also applies to: 7-12, 14-14
Gradata/src/gradata/tuning/agent_lightning/__init__.py (1)
15-24: New tuning package export surface is solid.The lazy
__getattr__pattern and__all__declaration provide a clean, stable API boundary.Also applies to: 27-40
Gradata/src/gradata/enhancements/graduation/rules_distillation.py (1)
7-12: Rules-distillation compatibility shim is well-structured.Deprecation notice and explicit re-exported API make this transition straightforward for downstream imports.
Also applies to: 14-20, 22-28
Gradata/src/gradata/integrations/agent_lightning/reward.py (1)
7-12: Reward shim migration is correctly implemented.The deprecated path still works and cleanly forwards to the new tuning namespace.
Also applies to: 14-14
Gradata/src/gradata/integrations/agent_lightning/litagent.py (1)
7-14: Deprecation shim looks correct and consistent.The warning + re-export behavior is clear and keeps backward compatibility intact.
| if session_type is None: | ||
| return True # backward compat: no filtering | ||
| testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES) | ||
| return session_type.lower() in testable_types |
There was a problem hiding this comment.
Guard unknown session_type values before membership check.
At Line 104, an unrecognized session_type (typo/new value) makes categories effectively non-testable, so decay gets skipped silently across the batch. Add an explicit guard/fallback before the lookup check.
Proposed fix
def is_category_testable(category: str, session_type: str | None) -> bool:
@@
if session_type is None:
return True # backward compat: no filtering
- testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES)
- return session_type.lower() in testable_types
+ normalized = session_type.lower()
+ if normalized not in ALL_SESSION_TYPES:
+ return True # backward compat for unknown/new session types
+ testable_types = CATEGORY_SESSION_TYPES.get(category.upper(), ALL_SESSION_TYPES)
+ return normalized in testable_types🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/experimental/judgment_decay.py` around lines
101 - 104, The membership check on session_type can silently skip decay for
typos/unexpected values; before calling session_type.lower() and looking up
CATEGORY_SESSION_TYPES, validate and normalize session_type (ensure it's a str,
strip/normalize casing) and explicitly guard unknown values: if the normalized
session_type is not present in CATEGORY_SESSION_TYPES keys and not in
ALL_SESSION_TYPES, return False (or log a warning) so unknown/typo values don't
make categories unintentionally non-testable; reference the session_type
variable, CATEGORY_SESSION_TYPES and ALL_SESSION_TYPES when applying this guard.
| # Representative description: most recent entry | ||
| representative = cat_entries[-1].description | ||
|
|
||
| # Check if already covered by existing rules | ||
| covered_by = _check_coverage( | ||
| " ".join(e.description for e in cat_entries), | ||
| existing_rules, | ||
| ) | ||
|
|
||
| sources = list({e.source for e in cat_entries}) | ||
| statuses = Counter(e.status for e in cat_entries) | ||
|
|
There was a problem hiding this comment.
Representative/source selection is order-unstable.
Line 89 assumes input order equals recency, and Line 97 builds evidence_sources from a set (non-deterministic order). This can produce inconsistent proposals/output.
Suggested fix
- # Representative description: most recent entry
- representative = cat_entries[-1].description
+ # Representative description: latest by date (expects ISO-8601 strings)
+ representative = max(cat_entries, key=lambda e: e.date).description
@@
- sources = list({e.source for e in cat_entries})
+ sources = sorted({e.source for e in cat_entries})🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/experimental/rules_distillation.py` around
lines 88 - 99, The code picks a representative description using cat_entries[-1]
and builds sources via an unordered set, which is non-deterministic; update the
logic to deterministically choose the most recent entry (e.g., select the entry
with the max timestamp/created_at field using max(cat_entries, key=lambda e:
e.timestamp or e.created_at)) and set representative = that_entry.description,
and replace sources = list({e.source for e in cat_entries}) with a sorted
deterministic list (e.g., sources = sorted({e.source for e in cat_entries}) or
sorted({e.source for e in cat_entries}, key=some_stable_key)) so both
representative selection and evidence_sources are stable across runs.
| for value in resources.values(): | ||
| if hasattr(value, "template") or isinstance(value, str): | ||
| return value | ||
| return self.prompt_template |
There was a problem hiding this comment.
Do not treat arbitrary string resources as prompt templates.
Line 100-Line 103 can pick unrelated string metadata from resources.values() and use it as the prompt template, producing invalid prompts.
Suggested fix
- for value in resources.values():
- if hasattr(value, "template") or isinstance(value, str):
- return value
+ for key in ("template", "prompt"):
+ candidate = resources.get(key)
+ if candidate is not None:
+ return candidate
return self.prompt_template🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py` around lines 100 -
103, The loop in litagent.py currently treats any string from resources.values()
as a prompt template (checking isinstance(value, str)) which can pick unrelated
metadata; change the condition to only accept real prompt-template objects
(e.g., objects exposing a template attribute or instances of the actual
PromptTemplate class your project uses) and remove the isinstance(value, str)
branch so only values with hasattr(value, "template") or isinstance(value,
PromptTemplate) are returned; update imports to reference PromptTemplate (or the
concrete template class used) and leave fallback to self.prompt_template
unchanged.
| except Exception as exc: | ||
| logger.debug("PromptTemplate.format failed, using raw template: %s", exc) | ||
| return str(prompt_template.template) |
There was a problem hiding this comment.
Escalate prompt-render fallback failures to warning with traceback.
Line 109-Line 111 currently hides prompt template errors behind debug-only logging, which makes rollout faults hard to diagnose.
Suggested fix
- except Exception as exc:
- logger.debug("PromptTemplate.format failed, using raw template: %s", exc)
+ except Exception:
+ logger.warning(
+ "PromptTemplate.format failed, using raw template",
+ exc_info=True,
+ )
return str(prompt_template.template)As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/tuning/agent_lightning/litagent.py` around lines 109 -
111, The current except block in litagent.py that catches formatting failures
for PromptTemplate (around the PromptTemplate.format call which returns
str(prompt_template.template)) only logs at debug level and hides tracebacks;
change the handler in the except Exception as exc within the function/method
that uses prompt_template.template to call logger.warning with a descriptive
message and include exc_info=True (to emit the traceback), keeping the fallback
return of str(prompt_template.template); do not suppress the exception or remove
the existing typed Exception catch.
| try: | ||
| results = brain.search(query, mode="events", top_k=20) | ||
| except Exception as exc: | ||
| logger.debug("brain.search failed during reward lookup: %s", exc) | ||
| results = [] | ||
|
|
||
| for result in results or []: | ||
| event = _event_from_search_result(result) | ||
| if event and event.get("type") == "CORRECTION": | ||
| events.append(event) | ||
|
|
||
| try: | ||
| history = brain.query_events(event_type="CORRECTION", limit=200) | ||
| except Exception as exc: | ||
| logger.debug("brain.query_events failed during reward lookup: %s", exc) | ||
| history = [] |
There was a problem hiding this comment.
Use warning logs with traceback for reward lookup failures.
Line 59 and Line 70 currently swallow lookup failures with debug-only logs, which can mask reward quality regressions in production.
Suggested fix
- except Exception as exc:
- logger.debug("brain.search failed during reward lookup: %s", exc)
+ except Exception:
+ logger.warning("brain.search failed during reward lookup", exc_info=True)
results = []
@@
- except Exception as exc:
- logger.debug("brain.query_events failed during reward lookup: %s", exc)
+ except Exception:
+ logger.warning("brain.query_events failed during reward lookup", exc_info=True)
history = []As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| results = brain.search(query, mode="events", top_k=20) | |
| except Exception as exc: | |
| logger.debug("brain.search failed during reward lookup: %s", exc) | |
| results = [] | |
| for result in results or []: | |
| event = _event_from_search_result(result) | |
| if event and event.get("type") == "CORRECTION": | |
| events.append(event) | |
| try: | |
| history = brain.query_events(event_type="CORRECTION", limit=200) | |
| except Exception as exc: | |
| logger.debug("brain.query_events failed during reward lookup: %s", exc) | |
| history = [] | |
| try: | |
| results = brain.search(query, mode="events", top_k=20) | |
| except Exception: | |
| logger.warning("brain.search failed during reward lookup", exc_info=True) | |
| results = [] | |
| for result in results or []: | |
| event = _event_from_search_result(result) | |
| if event and event.get("type") == "CORRECTION": | |
| events.append(event) | |
| try: | |
| history = brain.query_events(event_type="CORRECTION", limit=200) | |
| except Exception: | |
| logger.warning("brain.query_events failed during reward lookup", exc_info=True) | |
| history = [] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/tuning/agent_lightning/reward.py` around lines 57 - 72,
The current except blocks for brain.search and brain.query_events suppress
errors with logger.debug and lose traceback; update the except handlers in
reward lookup (around brain.search and brain.query_events in reward.py) to log
at warning level and include the exception traceback (pass exc_info=True or use
logger.warning(..., exc_info=exc)) while preserving existing fallback behavior
(setting results = [] and history = []), so failures are visible in production
for functions involving brain.search, brain.query_events, and subsequent
processing like _event_from_search_result.
| data = event.get("data") if isinstance(event.get("data"), dict) else {} | ||
| draft = _first_text(data, ("draft_text", "draft", "task_input", "input")) | ||
| final = _first_text(data, ("final_text", "final", "expected", "correction")) | ||
| if not draft or not final: | ||
| continue |
There was a problem hiding this comment.
Support data_json fallback when extracting correction dataset.
Line 148-Line 152 ignores events that store correction payload in data_json, which can silently drop valid training rows and undercut APO tuning.
Suggested fix
- data = event.get("data") if isinstance(event.get("data"), dict) else {}
+ data = event.get("data") if isinstance(event.get("data"), dict) else {}
+ if not data:
+ raw = event.get("data_json")
+ if isinstance(raw, str):
+ try:
+ parsed = json.loads(raw)
+ if isinstance(parsed, dict):
+ data = parsed
+ except json.JSONDecodeError:
+ pass
draft = _first_text(data, ("draft_text", "draft", "task_input", "input"))
final = _first_text(data, ("final_text", "final", "expected", "correction"))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/tuning/agent_lightning/runner.py` around lines 148 - 152,
The code currently only looks at event.get("data") for draft/final extraction
and skips events that store payload in data_json; update the extraction to also
check event.get("data_json") as a fallback: if event.get("data") is not a dict,
attempt to use event.get("data_json") (if it's a dict use it directly; if it's a
JSON string parse it with json.loads) before calling _first_text for
draft_text/draft/task_input/input and final_text/final/expected/correction.
Ensure you still skip when neither source yields a draft or final and keep using
the existing _first_text calls and variable names (data, draft, final).
| except Exception as exc: | ||
| logger.debug("APO did not expose a best prompt, using seed prompt: %s", exc) | ||
| return fallback |
There was a problem hiding this comment.
Use warning+traceback for best-prompt fallback exceptions.
Line 206-Line 208 currently logs best-prompt retrieval failures at debug level only, which can hide optimizer/runtime breakages.
Suggested fix
- except Exception as exc:
- logger.debug("APO did not expose a best prompt, using seed prompt: %s", exc)
+ except Exception:
+ logger.warning(
+ "APO did not expose a best prompt, using seed prompt",
+ exc_info=True,
+ )
return fallbackAs per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/tuning/agent_lightning/runner.py` around lines 206 - 208,
The except block that currently does "except Exception as exc" then logs at
debug and returns fallback should instead log a warning including the traceback
so failures aren't silently hidden; update the handler around the APO
best-prompt retrieval to call logger.warning with a clear message and
exc_info=True (keeping the "except Exception as exc" and the subsequent "return
fallback") so the exception and stacktrace for the best-prompt fallback are
recorded (refer to symbols APO, fallback, and logger in runner.py).
| expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text( | ||
| encoding="utf-8" | ||
| ).rstrip("\n") |
There was a problem hiding this comment.
Make fixture path independent from current working directory.
Using Path("tests/...") can fail when tests are run outside repo-root CWD. Resolve from __file__ instead.
Suggested fix
- expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text(
+ expected = (Path(__file__).resolve().parent / "fixtures" / "synthesize_prompt.golden.txt").read_text(
encoding="utf-8"
).rstrip("\n")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| expected = Path("tests/fixtures/synthesize_prompt.golden.txt").read_text( | |
| encoding="utf-8" | |
| ).rstrip("\n") | |
| expected = (Path(__file__).resolve().parent / "fixtures" / "synthesize_prompt.golden.txt").read_text( | |
| encoding="utf-8" | |
| ).rstrip("\n") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/tests/test_llm_synthesizer.py` around lines 129 - 131, The test
currently builds the fixture path with Path("tests/...") which breaks when CWD
isn't the repo root; update the code that sets expected (in
test_llm_synthesizer.py) to resolve the fixture relative to the test file by
using Path(__file__).parent (or appropriate ancestor) joined with
"fixtures/synthesize_prompt.golden.txt" before calling read_text so the fixture
lookup is independent of the current working directory.
Consolidates the remaining cleanup work from CLEANUP_ROADMAP.md. Follows up on PR #179 (Cleanup PR A).
PR B (Section 1+2):
graduation/scoring.py+ tests (alternate scoring spine, no production callers)judgment_decayandrules_distillationtoenhancements/experimental/with back-compat shims_correction_metrics.py; both_manifest_metricsandscoring/correction_trackingcall itPR C (Section 3):
rule_synthesizer.py+ tests (no production callers)meta_rulesdocstring to point atllm_synthesizer(the actual code path)llm_synthesizer._build_promptPR D (Section 6):
integrations/agent_lightningtotuning/agent_lightningDeprecationWarningcli.py,examples/tune_one_prompt.py, and tests to import fromtuning/Tests:
Risk: moderate. All renames have shims, deletes verified by
rgshowing zero production callers.Generated by codex/gpt-5.5 worker (proc_77cd9c62831a). Recovery patch landed via parent agent.