feat(v0.7.1): agent-lightning bridge — gradata tune (auto-improvement) by Gradata · Pull Request #172 · Gradata/gradata

Gradata · 2026-05-05T22:58:53Z

Summary

Wires Microsoft's agent-lightning APO (Automatic Prompt Optimization) algorithm into Gradata as the auto-improvement loop. The pitch: corrections become the reward signal; APO beam-searches prompt variants via textual gradients; no GPU required, runs on subscription CLIs via litellm proxy.

gradata tune <prompt-file> --brain ./my-brain produces an optimized prompt template scored against held-out corrections.

What's in

pyproject.toml extras: tune (basic agentlightning) and tune-apo (with APO algo). Bundled in all.
src/gradata/integrations/agent_lightning/ — bridge package (Layer 2):
- litagent.py — GradataLitAgent wrapper for Gradata-traced rollouts
- reward.py — gradata_reward() using brain.search() semantic match + difflib.SequenceMatcher
- runner.py — run_apo_tune() end-to-end (dataset split, in-memory store, APO trainer, optimized output)
src/gradata/cli.py — gradata tune command with --rounds, --beam, --branch, --brain, --out, --openai-api-base
examples/tune_one_prompt.py — end-to-end smoke
README "Auto-Improvement" section

Test plan

pytest tests/test_agent_lightning_bridge.py -xvs: 6/6 passed
pytest tests/ -x --timeout=60 -m "not integration": 4184 passed / 2 skipped / 5 deselected (was 4176; +6 new bridge tests + 2 picked up)
ruff check src/ tests/: pass
ruff format --check: 478 files clean
pyright src/: 0 errors, 16 existing optional-import warnings (no new)

Bridge tests use pytest.importorskip("agentlightning") so suite skips cleanly when extra not installed.

Layering check

No Layer 0 → 2 imports introduced. Bridge lives in integrations/ (Layer 2 territory).

Optional deps guarded at call site with try/except ImportError, never module-level — import gradata stays cheap.

Risk

None backward-incompat. Existing 4176 tests untouched; new bridge tests are opt-in via extra.
Optional dependency floor: agentlightning>=0.3.0 (PyPI latest; spec said >=0.3.1 but not yet published).

Decisions made autonomously (per AGENTS.md OODA godmode)

Prompt template stays in caller's file (option c from spec) — no Hermes coupling
In-memory LightningStore for v0.7.1 (SQLite spike was 37% pass-rate, dropped — will use upstream's when MS writes the real one)
pytest --timeout shim in conftest.py instead of adding pytest-timeout to dev deps
agentlightning[apo]>=0.3.0 bundled into all so APO available under full optional install

Why this matters strategically

Council previously flagged Gradata's moat problem: open-source SDK + BYO LLM + vapor cloud = no defensibility. This commit reframes the product:

Before: "Gradata captures corrections and graduates them to rules."
After: "Gradata captures corrections, graduates them to rules, AND auto-tunes your prompts using those corrections — no GPU, runs on your existing subscriptions."

The "auto-improvement" pitch becomes a YC headline. Mem0/Letta/LangMem don't ship this.

🤖 Built by [delegate→codex/gpt-5.5], reviewed by Claude opus-4-7.

Wires Microsoft's agent-lightning APO algorithm (https://github.com/microsoft/agent-lightning) into Gradata as the auto-improvement loop. Uses corrections as the reward signal; runs on subscription CLIs via litellm proxy. No GPU required. New deliverables: - pyproject extras: tune (basic), tune-apo (with APO algo), bundled in 'all' - src/gradata/integrations/agent_lightning/ — bridge package * litagent.py — GradataLitAgent wrapper * reward.py — corrections → APO reward signal (brain.search + difflib) * runner.py — run_apo_tune() end-to-end - src/gradata/cli.py — gradata tune <prompt-file> command * --rounds, --beam, --branch, --brain, --out, --openai-api-base - examples/tune_one_prompt.py — end-to-end smoke - README — Auto-Improvement section Layer 2 integration (uses Layer 0/1, exposes public API). Optional deps guarded at call site (try/except ImportError), never module level. Decisions made autonomously per AGENTS.md: - Prompt template lives in caller's file, no Hermes coupling (option c from spec) - In-memory store (SQLite spike abandoned — will use upstream's when ready) - agentlightning>=0.3.0 (PyPI latest; spec said 0.3.1 but not yet published) - pytest --timeout shim in conftest instead of new pytest-timeout dep Tests: 4184 passed / 2 skipped / 5 deselected (was 4176, +6 new bridge tests + 2 picked up) Lint: ruff check + format pass Types: pyright 0 errors, 16 existing optional-import warnings 🤖 Built by [delegate→codex/gpt-5.5], reviewed by Claude opus-4-7.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-05T22:59:05Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 143f0b31-7b08-4b83-8d0a-3e34eaf4f5bb

📥 Commits

Reviewing files that changed from the base of the PR and between d5a5df2 and 76ac5b4.

📒 Files selected for processing (2)

.gitignore
Gradata/.gitignore

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pytest windows-latest / py3.12
GitHub Check: pytest windows-latest / py3.11

🔇 Additional comments (2)

.gitignore (1)

210-210: LGTM!

The ignore entry for .hermes-backups/ is correctly formatted and appropriately placed under the Windows artifacts section.

Gradata/.gitignore (1)

12-12: LGTM!

The ignore entry for .hermes-backups/ is correctly formatted and appropriately placed with other test artifacts.

📝 Walkthrough

Integrates Microsoft agent-lightning APO for automatic prompt optimization via gradata tune CLI command with flags for rounds, beam width, branch factor, and output control
New public APIs: GradataLitAgent wrapper class, gradata_reward() function using corrections as reward signals, and run_apo_tune() orchestrator function
Optional dependencies: Added tune and tune-apo extras to pyproject.toml; both included in all extra; no runtime impact when unused
CLI enhancement: New gradata tune <prompt-file> subcommand with support for custom brain directories, rounds, and OpenAI API base configuration
Bridge architecture: Lives under integrations/ (Layer 2) with lazy imports; optional dependencies imported only at call sites to keep import gradata lightweight
Documentation & examples: Updated README with "Auto-Improvement" section; added examples/tune_one_prompt.py smoke test
Test coverage: 6 bridge tests added; all pass with pytest.importorskip guards for optional dependency scenarios
No breaking changes: Backward compatibility maintained; optional agentlightning[apo]>=0.3.0 dependency (spec desired >=0.3.1, not yet published)
Quality checks: Ruff and format pass; pyright shows 0 new errors (16 existing optional-import warnings); non-integration test suite: 4184 passed / 2 skipped
Minor enhancement: Improved vector output handling in diff_engine.py for numpy array compatibility

Walkthrough

Introduces Agent-Lightning integration for Gradata-based prompt tuning via APO-based auto-tuning. Adds optional dependencies, new CLI subcommand, reward/runner modules, LitAgent wrapper, comprehensive tests, and documentation. Includes a supporting fix to embedder output conversion.

Changes

Agent-Lightning Integration & Prompt Tuning

Layer / File(s)	Summary
Configuration & Dependencies `Gradata/pyproject.toml`	Added optional extras `tune` and `tune-apo` with `agentlightning>=0.3.0` and `agentlightning[apo]>=0.3.0`.
Reward & Scoring Logic `Gradata/src/gradata/integrations/agent_lightning/reward.py`	New public function `gradata_reward()` scores outputs against Gradata corrections using SequenceMatcher. Includes helpers for event search, task extraction, and final/draft matching with graceful fallback to 0.5 when no history exists.
APO Tuning Orchestration `Gradata/src/gradata/integrations/agent_lightning/runner.py`	Public function `run_apo_tune()` loads correction dataset, splits train/val, computes baseline score, optionally runs Agent-Lightning APO, and returns summary with optimized prompt and scores. Includes 7 helper functions for dataset loading, prompt rendering, and algorithm introspection.
LitAgent Integration Wrapper `Gradata/src/gradata/integrations/agent_lightning/litagent.py`	Introduces `GradataLitAgent` factory that binds a Brain and prompt template to a runtime LitAgent subclass. Implements `training_rollout()` and `validation_rollout()` using `_GradataLitAgentMixin` to render prompts, execute runners, compute rewards via `gradata_reward()`, and emit Agent-Lightning rewards.
Module Exports `Gradata/src/gradata/integrations/agent_lightning/__init__.py`	Lazy-loading module initializer exposing `GradataLitAgent`, `gradata_reward`, and `run_apo_tune` via `__getattr__` and `__all__` without enforcing runtime imports.
CLI Integration `Gradata/src/gradata/cli.py`	New `cmd_tune()` handler reads prompt from file, invokes `run_apo_tune()`, and outputs optimized prompt. Wired into CLI dispatch as `gradata tune` subcommand with arguments for prompt file, rounds, beam width, branch factor, brain directory, output file, and OpenAI API base.
Documentation & Examples `Gradata/README.md`, `Gradata/examples/tune_one_prompt.py`	README section describes `gradata tune` APO-based tuning with installation and usage guidance. New example script demonstrates end-to-end tuning with lightweight runner and parameter settings.
Supporting Fixes `Gradata/src/gradata/enhancements/diff_engine.py`	Hardened default embedder output conversion to robustly handle both numpy `.tolist()` and plain list-like vectors with appropriate type casts.
Tests `Gradata/tests/test_agent_lightning_bridge.py`, `Gradata/tests/conftest.py`	Comprehensive test suite validates `gradata_reward()` scoring (exact, partial, no-match), `GradataLitAgent` rollout emissions, `run_apo_tune()` workflow, and CLI end-to-end flow. Added pytest `--timeout` option hook for CI parity. Uses monkeypatching to inject fake agent-lightning module.
Cleanup `.gitignore`, `Gradata/.gitignore`	Added `.hermes-backups/` ignore pattern in both files.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as gradata tune
    participant Runner as run_apo_tune
    participant Brain as Gradata Brain
    participant Dataset as Correction Dataset
    participant APO as Agent-Lightning APO
    participant OpenAI as OpenAI API

    User->>CLI: gradata tune --prompt-file ...
    CLI->>Runner: run_apo_tune(brain_dir, prompt_template, runner_fn)
    Runner->>Brain: Load brain directory
    Runner->>Dataset: Read CORRECTION events from events.jsonl
    Dataset-->>Runner: tasks with draft/final pairs
    Runner->>Runner: Split dataset into train/val
    Runner->>Runner: Score baseline prompt (mean gradata_reward)
    alt rounds > 0
        Runner->>APO: Initialize APO with InMemoryLightningStore
        Runner->>APO: Configure Trainer with async OpenAI client
        loop APO Training Iterations
            APO->>Runner: Request prompt evaluation
            Runner->>OpenAI: Get response for task
            OpenAI-->>Runner: response text
            Runner->>Brain: Score response (gradata_reward via corrections)
            Brain-->>Runner: reward [0.0, 1.0]
            Runner->>APO: Emit reward for prompt variant
        end
        APO-->>Runner: Best optimized prompt
    end
    Runner->>Runner: Score optimized prompt on validation set
    Runner-->>CLI: {baseline_score, optimized_score, optimized_prompt, rounds_completed}
    CLI-->>User: Write/print optimized prompt and summary

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Gradata/gradata#85: The embedder output conversion hardening in diff_engine.py directly supports new semantic-embedding codepaths that consume embedder outputs.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: introducing an agent-lightning bridge with APO-based auto-prompt tuning via the 'gradata tune' command.
Description check	✅ Passed	The description comprehensively explains the feature integration, implementation details, test results, and strategic rationale—all directly related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v0.7.1-agent-lightning-bridge

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/cli.py`:
- Around line 1414-1424: The CLI defines a --branch option on the p_tune parser
but that value is never used in cmd_tune; either wire it into the tuning
workflow or, if reserved for future use, add an explicit comment in the cmd_tune
function noting that args.branch is intentionally unused (e.g., "args.branch
reserved for caller workflows — intentionally unused") and keep the argument for
backward compatibility; reference the p_tune parser and the cmd_tune function
when making the change.

In `@Gradata/src/gradata/integrations/agent_lightning/litagent.py`:
- Around line 63-79: training_rollout currently calls _load_emit_reward() on
every invocation; cache the emit function on the instance instead by loading it
once during _init_gradata() and storing it as self.emit_reward, then update
training_rollout to call self.emit_reward(reward) (references: training_rollout,
_load_emit_reward, _init_gradata, emit_reward).

In `@Gradata/src/gradata/integrations/agent_lightning/runner.py`:
- Around line 55-56: The code currently falls back to _expected_runner when
runner_fn is None (effective_runner = runner_fn or _expected_runner), causing
baseline_score = _score_prompt(...) to use ground-truth answers and produce
misleading APO signals; change this to require an explicit executor by either
(a) refusing to proceed / raising a clear error if runner_fn is None, or (b)
wiring a real fast executor implementation instead of _expected_runner; update
the logic around effective_runner, the call sites that compute baseline_score
(function _score_prompt) and any similar fallbacks around lines referenced (also
apply the same change to the repeated block around the 200-204 region) so APO
never silently defaults to the expected-label runner.
- Around line 41-42: The code currently mutates os.environ["OPENAI_API_BASE"];
instead, update _new_async_openai to accept an optional base_url parameter and
pass that into the AsyncOpenAI constructor (AsyncOpenAI(base_url=base_url)) so
clients use the provided openai_api_base without touching global env; then
change the caller to forward openai_api_base into _new_async_openai and remove
the os.environ assignment entirely.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 49c385be-d390-44bf-8c91-836bddbe647f

📥 Commits

Reviewing files that changed from the base of the PR and between b56816b and 87283ab.

⛔ Files ignored due to path filters (1)

Gradata/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (11)

Gradata/README.md
Gradata/examples/tune_one_prompt.py
Gradata/pyproject.toml
Gradata/src/gradata/cli.py
Gradata/src/gradata/enhancements/diff_engine.py
Gradata/src/gradata/integrations/agent_lightning/__init__.py
Gradata/src/gradata/integrations/agent_lightning/litagent.py
Gradata/src/gradata/integrations/agent_lightning/reward.py
Gradata/src/gradata/integrations/agent_lightning/runner.py
Gradata/tests/conftest.py
Gradata/tests/test_agent_lightning_bridge.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: pytest ubuntu-latest / py3.11
GitHub Check: pytest windows-latest / py3.11
GitHub Check: pytest macos-latest / py3.12
GitHub Check: pytest ubuntu-latest / py3.12
GitHub Check: pytest macos-latest / py3.11
GitHub Check: pytest windows-latest / py3.12

🧰 Additional context used

📓 Path-based instructions (3)

Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

Gradata/tests/conftest.py
Gradata/tests/test_agent_lightning_bridge.py

Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

Gradata/src/gradata/integrations/agent_lightning/runner.py
Gradata/src/gradata/integrations/agent_lightning/__init__.py
Gradata/src/gradata/enhancements/diff_engine.py
Gradata/src/gradata/cli.py
Gradata/src/gradata/integrations/agent_lightning/reward.py
Gradata/src/gradata/integrations/agent_lightning/litagent.py

Gradata/**/pyproject.toml

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Maintain dependencies = [] in pyproject.toml — the base package is pure Python + stdlib with all heavy dependencies gated as optional extras: embeddings, gemini, encrypted, ranking, adapters-mem0

Files:

Gradata/pyproject.toml

🔇 Additional comments (9)

Gradata/tests/conftest.py (1)

25-28: LGTM!

The pytest_addoption hook correctly registers the --timeout flag as a passthrough, preventing test failures when CI passes this flag but pytest-timeout is not installed.

Gradata/README.md (1)

127-142: LGTM!

The Auto-Improvement documentation clearly describes the gradata tune command, installation via gradata[tune-apo], and accurately explains the reward signal mechanism. The example command matches the CLI implementation.

Gradata/src/gradata/enhancements/diff_engine.py (1)

290-292: LGTM!

The vector conversion logic correctly handles both numpy arrays (via tolist()) and other iterable types. The type casts are appropriate for static analysis without affecting runtime behavior.

Gradata/src/gradata/integrations/agent_lightning/reward.py (2)

16-30: LGTM!

The reward computation is well-structured with proper null handling and graceful degradation (returning 0.5 when no matching corrections exist). The early return on empty finals prevents the max() call on an empty sequence.

55-74: LGTM!

Error handling is appropriately defensive — both brain.search() and brain.query_events() failures are caught and logged at debug level, falling back gracefully to empty results rather than propagating exceptions.

Gradata/src/gradata/integrations/agent_lightning/litagent.py (1)

119-135: LGTM!

The factory pattern using __new__ to create a runtime subclass that inherits from both the mixin and LitAgent is a clean approach for integrating with an optional dependency while preserving type safety.

Gradata/examples/tune_one_prompt.py (1)

10-29: LGTM!

The example clearly demonstrates the tuning workflow with a mock runner function and explains that users should replace it with their OpenAI-compatible client call. The result dictionary access matches the keys returned by run_apo_tune.

Gradata/src/gradata/cli.py (1)

508-538: LGTM!

The cmd_tune implementation correctly:

Imports the runner at call site (lazy loading for optional dependency)

Uses _resolve_brain_root for consistent brain directory resolution

Handles output to file or stdout based on --out flag

Reports meaningful summary with baseline/optimized scores

Gradata/pyproject.toml (1)

35-40: agentlightning>=0.3.0 is available on PyPI and the version constraint correctly allows the current release while accepting the desired 0.3.1 once published.

- runner.py: raise ValueError when runner_fn is None (no silent ground-truth fallback) - runner.py: pass base_url to AsyncOpenAI directly instead of mutating os.environ - litagent.py: cache emit_reward in __init__ instead of per-call lookup - cli.py: wire --branch flag through to branch_factor arg - tests: add test_run_apo_tune_rejects_missing_runner_fn All 4 CodeRabbit blocking findings cleared.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.hermes-backups/shannon-gradata-partial-20260430-192137/deliverables:
- Line 1: Remove the
`.hermes-backups/shannon-gradata-partial-20260430-192137/deliverables` backup
subproject-pointer file from the PR and ensure it is not committed to mainline;
delete the file from the branch (or revert the add) and update .gitignore to
include the `.hermes-backups/` path so future backup artifacts are ignored.
Locate the added artifact by name `deliverables` under the
`.hermes-backups/shannon-gradata-partial-20260430-192137/` directory in the diff
and remove that change from the commit history or create a new commit that
deletes it, then add `.hermes-backups/` to the repo ignore rules (or confirm an
existing ignore covers that pattern).

In `@Gradata/src/gradata/cli.py`:
- Around line 508-521: cmd_tune currently calls run_apo_tune without a
runner_fn, but run_apo_tune now rejects runner_fn=None; fix by constructing and
passing a real prompt executor to run_apo_tune: import or instantiate your
project's prompt executor (e.g., create_prompt_executor(...) or PromptExecutor
and its .run method) inside cmd_tune after reading the prompt, create a callable
runner_fn that accepts the same signature run_apo_tune expects, then call
run_apo_tune(..., runner_fn=runner_fn, prompt_template=prompt,
rounds=args.rounds, beam_width=args.beam, branch_factor=args.branch,
openai_api_base=args.openai_api_base) while keeping _resolve_brain_root(args)
for the brain root argument.
- Around line 529-537: The summary line currently prints to stdout after
printing the optimized prompt (variables optimized and result.get(...)), which
contaminates pipelines; change the summary print to write to stderr instead
(e.g., use sys.stderr or the CLI's logger) so stdout remains only the optimized
prompt. Locate the block in gradata.cli where optimized is printed and replace
the final print call that formats "baseline=... optimized=... rounds=..." to
emit to stderr (reference the formatted string using
baseline=float(result.get("baseline_score", 0.0)),
optimized=float(result.get("optimized_score", 0.0)),
rounds=int(result.get("rounds_completed", 0))).

In `@Gradata/tests/test_agent_lightning_bridge.py`:
- Around line 15-18: Remove the module-level pytest skip that uses
find_spec("agentlightning") in tests/test_agent_lightning_bridge.py so the
deterministic bridge tests run in CI even when the optional extra isn't
installed; instead, keep only true integration tests behind optional-dependency
skips and rely on the existing test helper _install_fake_agentlightning() to
stub the agentlightning module for these unit tests. Locate the pytestmark =
pytest.mark.skipif(...) declaration and delete or disable it, ensuring the test
file continues to call _install_fake_agentlightning() at setup and that any
real-LM integration cases are explicitly marked with `@pytest.mark.integration`
and left skipped by default.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cde24b34-f05a-429c-bf59-28b84ec3a773

📥 Commits

Reviewing files that changed from the base of the PR and between 87283ab and d5a5df2.

📒 Files selected for processing (5)

.hermes-backups/shannon-gradata-partial-20260430-192137/deliverables
Gradata/src/gradata/cli.py
Gradata/src/gradata/integrations/agent_lightning/litagent.py
Gradata/src/gradata/integrations/agent_lightning/runner.py
Gradata/tests/test_agent_lightning_bridge.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: pytest macos-latest / py3.12
GitHub Check: pytest ubuntu-latest / py3.12
GitHub Check: pytest windows-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.11
GitHub Check: pytest macos-latest / py3.11
GitHub Check: pytest windows-latest / py3.12

🧰 Additional context used

📓 Path-based instructions (2)

Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

Gradata/src/gradata/integrations/agent_lightning/runner.py
Gradata/src/gradata/cli.py
Gradata/src/gradata/integrations/agent_lightning/litagent.py

Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

Gradata/tests/test_agent_lightning_bridge.py

coderabbitai · 2026-05-06T04:38:26Z

+def cmd_tune(args):
+    """Tune a prompt file with Agent-Lightning APO and Gradata corrections."""
+    from gradata.integrations.agent_lightning.runner import run_apo_tune
+
+    prompt_path = Path(args.prompt_file)
+    prompt = prompt_path.read_text(encoding="utf-8")
+    result = run_apo_tune(
+        _resolve_brain_root(args),
+        prompt_template=prompt,
+        rounds=args.rounds,
+        beam_width=args.beam,
+        branch_factor=args.branch,
+        openai_api_base=args.openai_api_base,
+    )


⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Wire a real prompt executor into cmd_tune before exposing this command.

run_apo_tune() now rejects runner_fn=None, but this handler never constructs or passes one. That means every gradata tune ... invocation fails immediately with ValueError, so the new CLI path is not actually usable yet.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/cli.py` around lines 508 - 521, cmd_tune currently calls run_apo_tune without a runner_fn, but run_apo_tune now rejects runner_fn=None; fix by constructing and passing a real prompt executor to run_apo_tune: import or instantiate your project's prompt executor (e.g., create_prompt_executor(...) or PromptExecutor and its .run method) inside cmd_tune after reading the prompt, create a callable runner_fn that accepts the same signature run_apo_tune expects, then call run_apo_tune(..., runner_fn=runner_fn, prompt_template=prompt, rounds=args.rounds, beam_width=args.beam, branch_factor=args.branch, openai_api_base=args.openai_api_base) while keeping _resolve_brain_root(args) for the brain root argument.

coderabbitai · 2026-05-06T04:38:26Z

+    else:
+        print(optimized)
+
+    print(
+        "baseline={baseline:.3f} optimized={optimized:.3f} rounds={rounds}".format(
+            baseline=float(result.get("baseline_score", 0.0)),
+            optimized=float(result.get("optimized_score", 0.0)),
+            rounds=int(result.get("rounds_completed", 0)),
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep stdout reserved for the optimized prompt.

When --out is omitted, the prompt body is printed and then this summary line is appended to stdout. That breaks gradata tune prompt.md > optimized.md and any pipeline that treats stdout as the prompt text. Emit the metrics on stderr instead.

💡 Minimal fix

else: print(optimized) print( "baseline={baseline:.3f} optimized={optimized:.3f} rounds={rounds}".format( baseline=float(result.get("baseline_score", 0.0)), optimized=float(result.get("optimized_score", 0.0)), rounds=int(result.get("rounds_completed", 0)), - ) + ), + file=sys.stderr, )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/cli.py` around lines 529 - 537, The summary line currently prints to stdout after printing the optimized prompt (variables optimized and result.get(...)), which contaminates pipelines; change the summary print to write to stderr instead (e.g., use sys.stderr or the CLI's logger) so stdout remains only the optimized prompt. Locate the block in gradata.cli where optimized is printed and replace the final print call that formats "baseline=... optimized=... rounds=..." to emit to stderr (reference the formatted string using baseline=float(result.get("baseline_score", 0.0)), optimized=float(result.get("optimized_score", 0.0)), rounds=int(result.get("rounds_completed", 0))).

coderabbitai · 2026-05-06T04:38:26Z

+pytestmark = pytest.mark.skipif(
+    find_spec("agentlightning") is None,
+    reason="agentlightning is not installed",
+)


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Don’t skip these deterministic bridge tests when the optional extra is missing.

These tests already stub agentlightning with _install_fake_agentlightning(), so the module-level skipif(find_spec(...)) removes CI coverage for the bridge in the exact no-extra environment this PR claims to support. Keep only true real-dependency integration cases behind optional-dependency skips.

As per coding guidelines, "Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/tests/test_agent_lightning_bridge.py` around lines 15 - 18, Remove the module-level pytest skip that uses find_spec("agentlightning") in tests/test_agent_lightning_bridge.py so the deterministic bridge tests run in CI even when the optional extra isn't installed; instead, keep only true integration tests behind optional-dependency skips and rely on the existing test helper _install_fake_agentlightning() to stub the agentlightning module for these unit tests. Locate the pytestmark = pytest.mark.skipif(...) declaration and delete or disable it, ensuring the test file continues to call _install_fake_agentlightning() at setup and that any real-LM integration cases are explicitly marked with `@pytest.mark.integration` and left skipped by default.

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

coderabbitai Bot added the feature label May 5, 2026

coderabbitai Bot requested changes May 5, 2026

View reviewed changes

Comment thread Gradata/src/gradata/cli.py

Comment thread Gradata/src/gradata/integrations/agent_lightning/litagent.py

Comment thread Gradata/src/gradata/integrations/agent_lightning/runner.py Outdated

Comment thread Gradata/src/gradata/integrations/agent_lightning/runner.py Outdated

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

chore: gitignore .hermes-backups (accidentally added as gitlink)

76ac5b4

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

coderabbitai Bot requested changes May 6, 2026

View reviewed changes

Gradata merged commit 3d7b7f8 into main May 6, 2026
9 checks passed

Gradata deleted the v0.7.1-agent-lightning-bridge branch May 6, 2026 08:05

Gradata mentioned this pull request May 6, 2026

feat(v0.7.3): brain.prove() holdout validation + gradata demo + one-line installer #176

Merged

coderabbitai Bot mentioned this pull request May 6, 2026

chore: cleanup PRs B+C+D from CLEANUP_ROADMAP #182

Merged

Conversation

Gradata commented May 5, 2026

Summary

What's in

Test plan

Layering check

Risk

Decisions made autonomously (per AGENTS.md OODA godmode)

Why this matters strategically

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 5, 2026 •

edited

Loading