Skip to content

feat(v0.7.1): agent-lightning bridge — gradata tune (auto-improvement)#172

Merged
Gradata merged 3 commits into
mainfrom
v0.7.1-agent-lightning-bridge
May 6, 2026
Merged

feat(v0.7.1): agent-lightning bridge — gradata tune (auto-improvement)#172
Gradata merged 3 commits into
mainfrom
v0.7.1-agent-lightning-bridge

Conversation

@Gradata

@Gradata Gradata commented May 5, 2026

Copy link
Copy Markdown
Owner

Summary

Wires Microsoft's agent-lightning APO (Automatic Prompt Optimization) algorithm into Gradata as the auto-improvement loop. The pitch: corrections become the reward signal; APO beam-searches prompt variants via textual gradients; no GPU required, runs on subscription CLIs via litellm proxy.

gradata tune <prompt-file> --brain ./my-brain produces an optimized prompt template scored against held-out corrections.

What's in

  • pyproject.toml extras: tune (basic agentlightning) and tune-apo (with APO algo). Bundled in all.
  • src/gradata/integrations/agent_lightning/ — bridge package (Layer 2):
    • litagent.pyGradataLitAgent wrapper for Gradata-traced rollouts
    • reward.pygradata_reward() using brain.search() semantic match + difflib.SequenceMatcher
    • runner.pyrun_apo_tune() end-to-end (dataset split, in-memory store, APO trainer, optimized output)
  • src/gradata/cli.pygradata tune command with --rounds, --beam, --branch, --brain, --out, --openai-api-base
  • examples/tune_one_prompt.py — end-to-end smoke
  • README "Auto-Improvement" section

Test plan

  • pytest tests/test_agent_lightning_bridge.py -xvs: 6/6 passed
  • pytest tests/ -x --timeout=60 -m "not integration": 4184 passed / 2 skipped / 5 deselected (was 4176; +6 new bridge tests + 2 picked up)
  • ruff check src/ tests/: pass
  • ruff format --check: 478 files clean
  • pyright src/: 0 errors, 16 existing optional-import warnings (no new)

Bridge tests use pytest.importorskip("agentlightning") so suite skips cleanly when extra not installed.

Layering check

No Layer 0 → 2 imports introduced. Bridge lives in integrations/ (Layer 2 territory).

Optional deps guarded at call site with try/except ImportError, never module-level — import gradata stays cheap.

Risk

  • None backward-incompat. Existing 4176 tests untouched; new bridge tests are opt-in via extra.
  • Optional dependency floor: agentlightning>=0.3.0 (PyPI latest; spec said >=0.3.1 but not yet published).

Decisions made autonomously (per AGENTS.md OODA godmode)

  • Prompt template stays in caller's file (option c from spec) — no Hermes coupling
  • In-memory LightningStore for v0.7.1 (SQLite spike was 37% pass-rate, dropped — will use upstream's when MS writes the real one)
  • pytest --timeout shim in conftest.py instead of adding pytest-timeout to dev deps
  • agentlightning[apo]>=0.3.0 bundled into all so APO available under full optional install

Why this matters strategically

Council previously flagged Gradata's moat problem: open-source SDK + BYO LLM + vapor cloud = no defensibility. This commit reframes the product:

Before: "Gradata captures corrections and graduates them to rules."
After: "Gradata captures corrections, graduates them to rules, AND auto-tunes your prompts using those corrections — no GPU, runs on your existing subscriptions."

The "auto-improvement" pitch becomes a YC headline. Mem0/Letta/LangMem don't ship this.

🤖 Built by [delegate→codex/gpt-5.5], reviewed by Claude opus-4-7.

Wires Microsoft's agent-lightning APO algorithm (https://github.com/microsoft/agent-lightning)
into Gradata as the auto-improvement loop. Uses corrections as the reward
signal; runs on subscription CLIs via litellm proxy. No GPU required.

New deliverables:
- pyproject extras: tune (basic), tune-apo (with APO algo), bundled in 'all'
- src/gradata/integrations/agent_lightning/ — bridge package
  * litagent.py — GradataLitAgent wrapper
  * reward.py — corrections → APO reward signal (brain.search + difflib)
  * runner.py — run_apo_tune() end-to-end
- src/gradata/cli.py — gradata tune <prompt-file> command
  * --rounds, --beam, --branch, --brain, --out, --openai-api-base
- examples/tune_one_prompt.py — end-to-end smoke
- README — Auto-Improvement section

Layer 2 integration (uses Layer 0/1, exposes public API).
Optional deps guarded at call site (try/except ImportError), never module level.

Decisions made autonomously per AGENTS.md:
- Prompt template lives in caller's file, no Hermes coupling (option c from spec)
- In-memory store (SQLite spike abandoned — will use upstream's when ready)
- agentlightning>=0.3.0 (PyPI latest; spec said 0.3.1 but not yet published)
- pytest --timeout shim in conftest instead of new pytest-timeout dep

Tests: 4184 passed / 2 skipped / 5 deselected (was 4176, +6 new bridge tests + 2 picked up)
Lint: ruff check + format pass
Types: pyright 0 errors, 16 existing optional-import warnings

🤖 Built by [delegate→codex/gpt-5.5], reviewed by Claude opus-4-7.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 143f0b31-7b08-4b83-8d0a-3e34eaf4f5bb

📥 Commits

Reviewing files that changed from the base of the PR and between d5a5df2 and 76ac5b4.

📒 Files selected for processing (2)
  • .gitignore
  • Gradata/.gitignore
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
🔇 Additional comments (2)
.gitignore (1)

210-210: LGTM!

The ignore entry for .hermes-backups/ is correctly formatted and appropriately placed under the Windows artifacts section.

Gradata/.gitignore (1)

12-12: LGTM!

The ignore entry for .hermes-backups/ is correctly formatted and appropriately placed with other test artifacts.


📝 Walkthrough
  • Integrates Microsoft agent-lightning APO for automatic prompt optimization via gradata tune CLI command with flags for rounds, beam width, branch factor, and output control
  • New public APIs: GradataLitAgent wrapper class, gradata_reward() function using corrections as reward signals, and run_apo_tune() orchestrator function
  • Optional dependencies: Added tune and tune-apo extras to pyproject.toml; both included in all extra; no runtime impact when unused
  • CLI enhancement: New gradata tune <prompt-file> subcommand with support for custom brain directories, rounds, and OpenAI API base configuration
  • Bridge architecture: Lives under integrations/ (Layer 2) with lazy imports; optional dependencies imported only at call sites to keep import gradata lightweight
  • Documentation & examples: Updated README with "Auto-Improvement" section; added examples/tune_one_prompt.py smoke test
  • Test coverage: 6 bridge tests added; all pass with pytest.importorskip guards for optional dependency scenarios
  • No breaking changes: Backward compatibility maintained; optional agentlightning[apo]>=0.3.0 dependency (spec desired >=0.3.1, not yet published)
  • Quality checks: Ruff and format pass; pyright shows 0 new errors (16 existing optional-import warnings); non-integration test suite: 4184 passed / 2 skipped
  • Minor enhancement: Improved vector output handling in diff_engine.py for numpy array compatibility

Walkthrough

Introduces Agent-Lightning integration for Gradata-based prompt tuning via APO-based auto-tuning. Adds optional dependencies, new CLI subcommand, reward/runner modules, LitAgent wrapper, comprehensive tests, and documentation. Includes a supporting fix to embedder output conversion.

Changes

Agent-Lightning Integration & Prompt Tuning

Layer / File(s) Summary
Configuration & Dependencies
Gradata/pyproject.toml
Added optional extras tune and tune-apo with agentlightning>=0.3.0 and agentlightning[apo]>=0.3.0.
Reward & Scoring Logic
Gradata/src/gradata/integrations/agent_lightning/reward.py
New public function gradata_reward() scores outputs against Gradata corrections using SequenceMatcher. Includes helpers for event search, task extraction, and final/draft matching with graceful fallback to 0.5 when no history exists.
APO Tuning Orchestration
Gradata/src/gradata/integrations/agent_lightning/runner.py
Public function run_apo_tune() loads correction dataset, splits train/val, computes baseline score, optionally runs Agent-Lightning APO, and returns summary with optimized prompt and scores. Includes 7 helper functions for dataset loading, prompt rendering, and algorithm introspection.
LitAgent Integration Wrapper
Gradata/src/gradata/integrations/agent_lightning/litagent.py
Introduces GradataLitAgent factory that binds a Brain and prompt template to a runtime LitAgent subclass. Implements training_rollout() and validation_rollout() using _GradataLitAgentMixin to render prompts, execute runners, compute rewards via gradata_reward(), and emit Agent-Lightning rewards.
Module Exports
Gradata/src/gradata/integrations/agent_lightning/__init__.py
Lazy-loading module initializer exposing GradataLitAgent, gradata_reward, and run_apo_tune via __getattr__ and __all__ without enforcing runtime imports.
CLI Integration
Gradata/src/gradata/cli.py
New cmd_tune() handler reads prompt from file, invokes run_apo_tune(), and outputs optimized prompt. Wired into CLI dispatch as gradata tune subcommand with arguments for prompt file, rounds, beam width, branch factor, brain directory, output file, and OpenAI API base.
Documentation & Examples
Gradata/README.md, Gradata/examples/tune_one_prompt.py
README section describes gradata tune APO-based tuning with installation and usage guidance. New example script demonstrates end-to-end tuning with lightweight runner and parameter settings.
Supporting Fixes
Gradata/src/gradata/enhancements/diff_engine.py
Hardened default embedder output conversion to robustly handle both numpy .tolist() and plain list-like vectors with appropriate type casts.
Tests
Gradata/tests/test_agent_lightning_bridge.py, Gradata/tests/conftest.py
Comprehensive test suite validates gradata_reward() scoring (exact, partial, no-match), GradataLitAgent rollout emissions, run_apo_tune() workflow, and CLI end-to-end flow. Added pytest --timeout option hook for CI parity. Uses monkeypatching to inject fake agent-lightning module.
Cleanup
.gitignore, Gradata/.gitignore
Added .hermes-backups/ ignore pattern in both files.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as gradata tune
    participant Runner as run_apo_tune
    participant Brain as Gradata Brain
    participant Dataset as Correction Dataset
    participant APO as Agent-Lightning APO
    participant OpenAI as OpenAI API

    User->>CLI: gradata tune --prompt-file ...
    CLI->>Runner: run_apo_tune(brain_dir, prompt_template, runner_fn)
    Runner->>Brain: Load brain directory
    Runner->>Dataset: Read CORRECTION events from events.jsonl
    Dataset-->>Runner: tasks with draft/final pairs
    Runner->>Runner: Split dataset into train/val
    Runner->>Runner: Score baseline prompt (mean gradata_reward)
    alt rounds > 0
        Runner->>APO: Initialize APO with InMemoryLightningStore
        Runner->>APO: Configure Trainer with async OpenAI client
        loop APO Training Iterations
            APO->>Runner: Request prompt evaluation
            Runner->>OpenAI: Get response for task
            OpenAI-->>Runner: response text
            Runner->>Brain: Score response (gradata_reward via corrections)
            Brain-->>Runner: reward [0.0, 1.0]
            Runner->>APO: Emit reward for prompt variant
        end
        APO-->>Runner: Best optimized prompt
    end
    Runner->>Runner: Score optimized prompt on validation set
    Runner-->>CLI: {baseline_score, optimized_score, optimized_prompt, rounds_completed}
    CLI-->>User: Write/print optimized prompt and summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Gradata/gradata#85: The embedder output conversion hardening in diff_engine.py directly supports new semantic-embedding codepaths that consume embedder outputs.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing an agent-lightning bridge with APO-based auto-prompt tuning via the 'gradata tune' command.
Description check ✅ Passed The description comprehensively explains the feature integration, implementation details, test results, and strategic rationale—all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v0.7.1-agent-lightning-bridge

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label May 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/cli.py`:
- Around line 1414-1424: The CLI defines a --branch option on the p_tune parser
but that value is never used in cmd_tune; either wire it into the tuning
workflow or, if reserved for future use, add an explicit comment in the cmd_tune
function noting that args.branch is intentionally unused (e.g., "args.branch
reserved for caller workflows — intentionally unused") and keep the argument for
backward compatibility; reference the p_tune parser and the cmd_tune function
when making the change.

In `@Gradata/src/gradata/integrations/agent_lightning/litagent.py`:
- Around line 63-79: training_rollout currently calls _load_emit_reward() on
every invocation; cache the emit function on the instance instead by loading it
once during _init_gradata() and storing it as self.emit_reward, then update
training_rollout to call self.emit_reward(reward) (references: training_rollout,
_load_emit_reward, _init_gradata, emit_reward).

In `@Gradata/src/gradata/integrations/agent_lightning/runner.py`:
- Around line 55-56: The code currently falls back to _expected_runner when
runner_fn is None (effective_runner = runner_fn or _expected_runner), causing
baseline_score = _score_prompt(...) to use ground-truth answers and produce
misleading APO signals; change this to require an explicit executor by either
(a) refusing to proceed / raising a clear error if runner_fn is None, or (b)
wiring a real fast executor implementation instead of _expected_runner; update
the logic around effective_runner, the call sites that compute baseline_score
(function _score_prompt) and any similar fallbacks around lines referenced (also
apply the same change to the repeated block around the 200-204 region) so APO
never silently defaults to the expected-label runner.
- Around line 41-42: The code currently mutates os.environ["OPENAI_API_BASE"];
instead, update _new_async_openai to accept an optional base_url parameter and
pass that into the AsyncOpenAI constructor (AsyncOpenAI(base_url=base_url)) so
clients use the provided openai_api_base without touching global env; then
change the caller to forward openai_api_base into _new_async_openai and remove
the os.environ assignment entirely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 49c385be-d390-44bf-8c91-836bddbe647f

📥 Commits

Reviewing files that changed from the base of the PR and between b56816b and 87283ab.

⛔ Files ignored due to path filters (1)
  • Gradata/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • Gradata/README.md
  • Gradata/examples/tune_one_prompt.py
  • Gradata/pyproject.toml
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/enhancements/diff_engine.py
  • Gradata/src/gradata/integrations/agent_lightning/__init__.py
  • Gradata/src/gradata/integrations/agent_lightning/litagent.py
  • Gradata/src/gradata/integrations/agent_lightning/reward.py
  • Gradata/src/gradata/integrations/agent_lightning/runner.py
  • Gradata/tests/conftest.py
  • Gradata/tests/test_agent_lightning_bridge.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
📓 Path-based instructions (3)
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/conftest.py
  • Gradata/tests/test_agent_lightning_bridge.py
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/integrations/agent_lightning/runner.py
  • Gradata/src/gradata/integrations/agent_lightning/__init__.py
  • Gradata/src/gradata/enhancements/diff_engine.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/integrations/agent_lightning/reward.py
  • Gradata/src/gradata/integrations/agent_lightning/litagent.py
Gradata/**/pyproject.toml

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Maintain dependencies = [] in pyproject.toml — the base package is pure Python + stdlib with all heavy dependencies gated as optional extras: embeddings, gemini, encrypted, ranking, adapters-mem0

Files:

  • Gradata/pyproject.toml
🔇 Additional comments (9)
Gradata/tests/conftest.py (1)

25-28: LGTM!

The pytest_addoption hook correctly registers the --timeout flag as a passthrough, preventing test failures when CI passes this flag but pytest-timeout is not installed.

Gradata/README.md (1)

127-142: LGTM!

The Auto-Improvement documentation clearly describes the gradata tune command, installation via gradata[tune-apo], and accurately explains the reward signal mechanism. The example command matches the CLI implementation.

Gradata/src/gradata/enhancements/diff_engine.py (1)

290-292: LGTM!

The vector conversion logic correctly handles both numpy arrays (via tolist()) and other iterable types. The type casts are appropriate for static analysis without affecting runtime behavior.

Gradata/src/gradata/integrations/agent_lightning/reward.py (2)

16-30: LGTM!

The reward computation is well-structured with proper null handling and graceful degradation (returning 0.5 when no matching corrections exist). The early return on empty finals prevents the max() call on an empty sequence.


55-74: LGTM!

Error handling is appropriately defensive — both brain.search() and brain.query_events() failures are caught and logged at debug level, falling back gracefully to empty results rather than propagating exceptions.

Gradata/src/gradata/integrations/agent_lightning/litagent.py (1)

119-135: LGTM!

The factory pattern using __new__ to create a runtime subclass that inherits from both the mixin and LitAgent is a clean approach for integrating with an optional dependency while preserving type safety.

Gradata/examples/tune_one_prompt.py (1)

10-29: LGTM!

The example clearly demonstrates the tuning workflow with a mock runner function and explains that users should replace it with their OpenAI-compatible client call. The result dictionary access matches the keys returned by run_apo_tune.

Gradata/src/gradata/cli.py (1)

508-538: LGTM!

The cmd_tune implementation correctly:

  • Imports the runner at call site (lazy loading for optional dependency)
  • Uses _resolve_brain_root for consistent brain directory resolution
  • Handles output to file or stdout based on --out flag
  • Reports meaningful summary with baseline/optimized scores
Gradata/pyproject.toml (1)

35-40: agentlightning>=0.3.0 is available on PyPI and the version constraint correctly allows the current release while accepting the desired 0.3.1 once published.

Comment thread Gradata/src/gradata/cli.py
Comment thread Gradata/src/gradata/integrations/agent_lightning/litagent.py
Comment thread Gradata/src/gradata/integrations/agent_lightning/runner.py Outdated
Comment thread Gradata/src/gradata/integrations/agent_lightning/runner.py Outdated
- runner.py: raise ValueError when runner_fn is None (no silent ground-truth fallback)
- runner.py: pass base_url to AsyncOpenAI directly instead of mutating os.environ
- litagent.py: cache emit_reward in __init__ instead of per-call lookup
- cli.py: wire --branch flag through to branch_factor arg
- tests: add test_run_apo_tune_rejects_missing_runner_fn

All 4 CodeRabbit blocking findings cleared.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.hermes-backups/shannon-gradata-partial-20260430-192137/deliverables:
- Line 1: Remove the
`.hermes-backups/shannon-gradata-partial-20260430-192137/deliverables` backup
subproject-pointer file from the PR and ensure it is not committed to mainline;
delete the file from the branch (or revert the add) and update .gitignore to
include the `.hermes-backups/` path so future backup artifacts are ignored.
Locate the added artifact by name `deliverables` under the
`.hermes-backups/shannon-gradata-partial-20260430-192137/` directory in the diff
and remove that change from the commit history or create a new commit that
deletes it, then add `.hermes-backups/` to the repo ignore rules (or confirm an
existing ignore covers that pattern).

In `@Gradata/src/gradata/cli.py`:
- Around line 508-521: cmd_tune currently calls run_apo_tune without a
runner_fn, but run_apo_tune now rejects runner_fn=None; fix by constructing and
passing a real prompt executor to run_apo_tune: import or instantiate your
project's prompt executor (e.g., create_prompt_executor(...) or PromptExecutor
and its .run method) inside cmd_tune after reading the prompt, create a callable
runner_fn that accepts the same signature run_apo_tune expects, then call
run_apo_tune(..., runner_fn=runner_fn, prompt_template=prompt,
rounds=args.rounds, beam_width=args.beam, branch_factor=args.branch,
openai_api_base=args.openai_api_base) while keeping _resolve_brain_root(args)
for the brain root argument.
- Around line 529-537: The summary line currently prints to stdout after
printing the optimized prompt (variables optimized and result.get(...)), which
contaminates pipelines; change the summary print to write to stderr instead
(e.g., use sys.stderr or the CLI's logger) so stdout remains only the optimized
prompt. Locate the block in gradata.cli where optimized is printed and replace
the final print call that formats "baseline=... optimized=... rounds=..." to
emit to stderr (reference the formatted string using
baseline=float(result.get("baseline_score", 0.0)),
optimized=float(result.get("optimized_score", 0.0)),
rounds=int(result.get("rounds_completed", 0))).

In `@Gradata/tests/test_agent_lightning_bridge.py`:
- Around line 15-18: Remove the module-level pytest skip that uses
find_spec("agentlightning") in tests/test_agent_lightning_bridge.py so the
deterministic bridge tests run in CI even when the optional extra isn't
installed; instead, keep only true integration tests behind optional-dependency
skips and rely on the existing test helper _install_fake_agentlightning() to
stub the agentlightning module for these unit tests. Locate the pytestmark =
pytest.mark.skipif(...) declaration and delete or disable it, ensuring the test
file continues to call _install_fake_agentlightning() at setup and that any
real-LM integration cases are explicitly marked with `@pytest.mark.integration`
and left skipped by default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cde24b34-f05a-429c-bf59-28b84ec3a773

📥 Commits

Reviewing files that changed from the base of the PR and between 87283ab and d5a5df2.

📒 Files selected for processing (5)
  • .hermes-backups/shannon-gradata-partial-20260430-192137/deliverables
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/integrations/agent_lightning/litagent.py
  • Gradata/src/gradata/integrations/agent_lightning/runner.py
  • Gradata/tests/test_agent_lightning_bridge.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/integrations/agent_lightning/runner.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/integrations/agent_lightning/litagent.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_agent_lightning_bridge.py

Comment thread .hermes-backups/shannon-gradata-partial-20260430-192137/deliverables Outdated
Comment on lines +508 to +521
def cmd_tune(args):
"""Tune a prompt file with Agent-Lightning APO and Gradata corrections."""
from gradata.integrations.agent_lightning.runner import run_apo_tune

prompt_path = Path(args.prompt_file)
prompt = prompt_path.read_text(encoding="utf-8")
result = run_apo_tune(
_resolve_brain_root(args),
prompt_template=prompt,
rounds=args.rounds,
beam_width=args.beam,
branch_factor=args.branch,
openai_api_base=args.openai_api_base,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Wire a real prompt executor into cmd_tune before exposing this command.

run_apo_tune() now rejects runner_fn=None, but this handler never constructs or passes one. That means every gradata tune ... invocation fails immediately with ValueError, so the new CLI path is not actually usable yet.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/cli.py` around lines 508 - 521, cmd_tune currently calls
run_apo_tune without a runner_fn, but run_apo_tune now rejects runner_fn=None;
fix by constructing and passing a real prompt executor to run_apo_tune: import
or instantiate your project's prompt executor (e.g., create_prompt_executor(...)
or PromptExecutor and its .run method) inside cmd_tune after reading the prompt,
create a callable runner_fn that accepts the same signature run_apo_tune
expects, then call run_apo_tune(..., runner_fn=runner_fn,
prompt_template=prompt, rounds=args.rounds, beam_width=args.beam,
branch_factor=args.branch, openai_api_base=args.openai_api_base) while keeping
_resolve_brain_root(args) for the brain root argument.

Comment on lines +529 to +537
else:
print(optimized)

print(
"baseline={baseline:.3f} optimized={optimized:.3f} rounds={rounds}".format(
baseline=float(result.get("baseline_score", 0.0)),
optimized=float(result.get("optimized_score", 0.0)),
rounds=int(result.get("rounds_completed", 0)),
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep stdout reserved for the optimized prompt.

When --out is omitted, the prompt body is printed and then this summary line is appended to stdout. That breaks gradata tune prompt.md > optimized.md and any pipeline that treats stdout as the prompt text. Emit the metrics on stderr instead.

💡 Minimal fix
     else:
         print(optimized)

     print(
         "baseline={baseline:.3f} optimized={optimized:.3f} rounds={rounds}".format(
             baseline=float(result.get("baseline_score", 0.0)),
             optimized=float(result.get("optimized_score", 0.0)),
             rounds=int(result.get("rounds_completed", 0)),
-        )
+        ),
+        file=sys.stderr,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/cli.py` around lines 529 - 537, The summary line
currently prints to stdout after printing the optimized prompt (variables
optimized and result.get(...)), which contaminates pipelines; change the summary
print to write to stderr instead (e.g., use sys.stderr or the CLI's logger) so
stdout remains only the optimized prompt. Locate the block in gradata.cli where
optimized is printed and replace the final print call that formats "baseline=...
optimized=... rounds=..." to emit to stderr (reference the formatted string
using baseline=float(result.get("baseline_score", 0.0)),
optimized=float(result.get("optimized_score", 0.0)),
rounds=int(result.get("rounds_completed", 0))).

Comment on lines +15 to +18
pytestmark = pytest.mark.skipif(
find_spec("agentlightning") is None,
reason="agentlightning is not installed",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Don’t skip these deterministic bridge tests when the optional extra is missing.

These tests already stub agentlightning with _install_fake_agentlightning(), so the module-level skipif(find_spec(...)) removes CI coverage for the bridge in the exact no-extra environment this PR claims to support. Keep only true real-dependency integration cases behind optional-dependency skips.

As per coding guidelines, "Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/tests/test_agent_lightning_bridge.py` around lines 15 - 18, Remove
the module-level pytest skip that uses find_spec("agentlightning") in
tests/test_agent_lightning_bridge.py so the deterministic bridge tests run in CI
even when the optional extra isn't installed; instead, keep only true
integration tests behind optional-dependency skips and rely on the existing test
helper _install_fake_agentlightning() to stub the agentlightning module for
these unit tests. Locate the pytestmark = pytest.mark.skipif(...) declaration
and delete or disable it, ensuring the test file continues to call
_install_fake_agentlightning() at setup and that any real-LM integration cases
are explicitly marked with `@pytest.mark.integration` and left skipped by default.

@Gradata Gradata merged commit 3d7b7f8 into main May 6, 2026
9 checks passed
@Gradata Gradata deleted the v0.7.1-agent-lightning-bridge branch May 6, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant