fix: improve eval resume flow consistency and add validation#1177
Merged
Conversation
a61549e to
c336c36
Compare
akshaylive
reviewed
Jan 23, 2026
akshaylive
left a comment
Collaborator
There was a problem hiding this comment.
How would this work when num_workers > 0? If it isn't supposed to work, we need to alert it somehow and halt execution. In other words, we may want to pass the resume to only the runtime instance that had previously suspended, and not to all runtime instances.
smflorentino
approved these changes
Jan 23, 2026
Contributor
Author
It won't work for more than 1, its a limitation of suspend and resume. We can raise an error. |
added 3 commits
January 25, 2026 12:46
Previously, the eval flow only passed UiPathExecuteOptions when resume=True, but passed no options at all when resume=False. This is inconsistent with the debug flow (cli_debug.py) which always explicitly passes the options. This change: - Makes the eval flow always pass UiPathExecuteOptions explicitly - Simplifies the if/else logic by separating input determination from execution - Ensures consistency across both debug and eval commands - Makes the resume=False intent explicit rather than relying on default behavior While functionally equivalent (execute() accepts options=None and defaults resume to False), this change improves code maintainability and explicitness.
Add comprehensive unit tests to verify that UiPathExecuteOptions is always passed explicitly in the eval flow, matching the debug flow pattern. Tests verify: - UiPathExecuteOptions(resume=False) is passed when resume=False - UiPathExecuteOptions(resume=True) is passed when resume=True - Options are NEVER None, always explicit - Behavior is consistent with debug flow Uses mocking to directly test the execute_runtime method, ensuring the specific code path we modified is properly tested.
- Add ValueError when resume mode is used with multiple evaluations - Validates early in initiate_evaluation() before expensive operations - Provides clear error message with guidance to use --eval-ids - Add test coverage with new multiple-evals.json fixture - Add test_resume_with_multiple_evaluations_raises_error() test
29ddfc2 to
cd4d8d5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The eval flow had two issues:
Inconsistency in handling resume options compared to the debug flow:
cli_debug.py): Always explicitly passesUiPathExecuteOptions(resume=resume)regardless of the resume value_runtime.py): Only passedUiPathExecuteOptions(resume=True)when resume=True, but passed no options at all when resume=FalseMissing validation for the unsupported scenario of running multiple evaluations with resume mode enabled
Code Comparison (Issue #1)
Before (eval flow):
Debug flow (correct pattern):
Issue #2: Missing Validation
Resume mode relies on checkpoint discovery using a consistent
thread_id. When multiple evaluations run in parallel, each creates separate runtime contexts, making it impossible to determine which checkpoint to resume from. This was not being caught, potentially leading to confusing runtime behavior.Solutions
1. Consistent Options Passing
Made the eval flow consistent with the debug flow by:
UiPathExecuteOptionsexplicitly, regardless of resume valueAfter:
2. Resume Mode Validation
Added early validation in
initiate_evaluation()to catch the unsupported scenario:Benefits:
Testing
test_resume_with_multiple_evaluations_raises_error()to verify validationmultiple-evals.jsontest fixture with 2 evaluationsImpact