Skip to content

fix: legacy evaluation reporting to backend with Strategy Pattern#1040

Closed
Chibionos wants to merge 5 commits into
mainfrom
fix/legacy-eval-request-wrapper
Closed

fix: legacy evaluation reporting to backend with Strategy Pattern#1040
Chibionos wants to merge 5 commits into
mainfrom
fix/legacy-eval-request-wrapper

Conversation

@Chibionos

@Chibionos Chibionos commented Dec 19, 2025

Copy link
Copy Markdown
Contributor

Summary

  • Fix legacy evaluation reporting (HTTP 400 errors)
  • Implement Strategy Pattern for legacy vs coded eval flows
  • Refactor into modular _reporting/ package
  • Add logging for eval set run schema reporting
  • Bump version to 2.2.37

Changes

  • _reporting/_strategies.py - Protocol + strategy implementations
  • _reporting/_reporter.py - Main StudioWebProgressReporter class with logging
  • _reporting/_utils.py - Error handling decorator
  • Backward compatibility maintained via re-exports

Logging Added

  • INFO-level logging when creating eval set runs showing inputSchema and outputSchema
  • DEBUG-level logging for full payloads on all eval reporting operations
  • WARNING when entrypoint is not provided, falling back to empty schemas

Tests

  • 33 tests for reporter (including new agent snapshot extraction tests)
  • All lint and format checks passing

🤖 Generated with Claude Code

@github-actions github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Dec 19, 2025
This PR fixes legacy evaluation reporting to the backend that was returning
HTTP 400 errors and implements the Strategy Pattern for cleaner code separation.

## Changes

### Strategy Pattern Implementation
- Created `EvalReportingStrategy` Protocol defining the interface for evaluation
  reporting strategies
- Implemented `LegacyEvalReportingStrategy` for legacy evaluations:
  - Converts string IDs to deterministic GUIDs using uuid5
  - Uses endpoints without /coded/ prefix
  - Uses assertionRuns format with assertionSnapshot
- Implemented `CodedEvalReportingStrategy` for coded evaluations:
  - Keeps IDs as strings
  - Uses /coded/ endpoint prefix
  - Uses evaluatorRuns format with evaluationCriterias

### Bug Fixes
- Fixed legacy eval API payload structure for backend compatibility
- Added type assertion for project_id to fix mypy errors
- Removed unused ABC, abstractmethod imports after Protocol migration

### Test Results
- All 27 unit tests passing
- All linting checks (ruff, mypy) passing
- Integration testing with calculator sample: all API calls returning HTTP 200 OK

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Chibionos Chibionos force-pushed the fix/legacy-eval-request-wrapper branch from ab72b22 to c6cd5c3 Compare December 19, 2025 04:40
- Create _reporting/ package with focused modules
- Split strategies, utils, and reporter into separate files
- Maintain backward compatibility via re-exports
- Split tests to match new structure (48 tests, up from 27)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Chibionos Chibionos requested a review from mjnovice December 19, 2025 07:18
Chibi Vikram and others added 2 commits December 18, 2025 23:55
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add INFO-level logging to show inputSchema and outputSchema when
  creating eval set runs for better debugging
- Add DEBUG-level logging for full payloads on all eval reporting operations
- Add warning when entrypoint is not provided, falling back to empty schemas
- Add tests for agent snapshot extraction behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@@ -0,0 +1,418 @@
"""Evaluation reporting strategies for legacy and coded evaluations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we split the strategies into separate files ?

Split the monolithic _strategies.py into separate files for better
code organization:
- _strategy_protocol.py: Protocol definition
- _legacy_strategy.py: Legacy evaluation reporting strategy
- _coded_strategy.py: Coded evaluation reporting strategy
- _strategies.py: Re-exports for backward compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Chibionos

Copy link
Copy Markdown
Contributor Author

Closing as stale: this branch predates the monorepo migration (#1403) and is 540+ commits behind — the target file moved to packages/uipath/src/uipath/_cli/_evals/_progress_reporter.py and changed by ~900 lines, so a rebase is not viable. The strategy-pattern reporting refactor will be reimplemented on current main as part of extracting the evaluation framework into its own package (so the python eval worker in the agents repo can consume evaluators directly). A fresh PR will follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants