Skip to content

fix: prevent duplicate eval run entries during suspend/resume#1176

Merged
Chibionos merged 3 commits into
mainfrom
fix/duplicate-eval-runs-on-resume
Jan 22, 2026
Merged

fix: prevent duplicate eval run entries during suspend/resume#1176
Chibionos merged 3 commits into
mainfrom
fix/duplicate-eval-runs-on-resume

Conversation

@Chibionos

Copy link
Copy Markdown
Contributor

Problem

When running evaluations with suspend/resume, two separate entries are created in StudioWeb instead of updating the same entry:

Entry #1 (Suspend Phase):

  • Status: "suspended"
  • Contains: Triggers for resume
  • Missing: Evaluator results

Entry #2 (Resume Phase):

  • Status: "completed"
  • Contains: Evaluator results
  • Missing: Suspend information

Both entries have the same evalSetRunId and evalSnapshot.id but different entry IDs, causing confusion in the SW UI.

Root Cause

The CREATE_EVAL_RUN event is published on BOTH suspend execution AND resume execution in src/uipath/_cli/_evals/_runtime.py (lines 516-522).

Execution Flow

Suspend Phase:

  1. Line 516: Publishes CREATE_EVAL_RUN → Creates Entry Feat/basic invoke process #1
  2. Agent executes and calls interrupt() → Returns SUSPENDED
  3. Line 610: Publishes UPDATE_EVAL_RUN → Updates Entry Feat/basic invoke process #1 with suspend info

Resume Phase:

  1. Line 516: Publishes CREATE_EVAL_RUN AGAIN → Creates Entry Refactor/folder layout #2
  2. Agent resumes and completes execution
  3. Line 610+: Publishes UPDATE_EVAL_RUN → Updates Entry Refactor/folder layout #2 with completion info

Solution

Added a check for self.context.resume before publishing CREATE_EVAL_RUN:

# Only create eval run entry if NOT resuming from a checkpoint
# When resuming, the entry already exists from the suspend phase
if not self.context.resume:
    await self.event_bus.publish(
        EvaluationEvents.CREATE_EVAL_RUN,
        EvalRunCreatedEvent(
            execution_id=execution_id,
            eval_item=eval_item,
        ),
    )

Now:

  • On initial execution: Creates new eval run entry (as before)
  • On resume: Skips creation, only updates the existing entry

Impact

Users will see a single eval run entry with complete lifecycle:
pending → suspended → completed

StudioWeb UI will show cleaner results and accurate metrics

Trace data will have a single eval run with complete history

Testing

Tested with local suspend/resume evaluation cycles to verify:

  • Only ONE entry is created in SW
  • Entry is updated correctly during suspend phase
  • Entry is updated correctly during resume phase
  • No duplicate entries appear

Related Documentation

Investigation documented in: SUSPEND_RESUME_DUPLICATE_ENTRIES_INVESTIGATION.md (backend repo)

@github-actions github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 22, 2026
@smflorentino smflorentino self-requested a review January 22, 2026 15:42
Chibi Vikram and others added 3 commits January 22, 2026 09:53
Fixes duplicate eval run entries in StudioWeb during suspend/resume cycles.

## Problem
When running evaluations with suspend/resume, two separate entries were
created in StudioWeb instead of updating the same entry:
- First entry: Created during suspend phase with "suspended" status
- Second entry: Created during resume phase with "completed" status

Both entries had the same evalSetRunId and evalSnapshot.id but different
entry IDs, causing confusion in the SW UI.

## Root Cause
The CREATE_EVAL_RUN event was published on BOTH suspend execution AND
resume execution (line 516-522). This created a new database entry each
time, instead of updating the existing entry on resume.

## Solution
Added a check for `self.context.resume` before publishing CREATE_EVAL_RUN.
Now:
- On initial execution: Creates new eval run entry (as before)
- On resume: Skips creation, only updates the existing entry

## Impact
- Users will see a single eval run entry with complete lifecycle:
  pending → suspended → completed
- StudioWeb UI will show cleaner results and accurate metrics
- Trace data will have a single eval run with complete history

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive test suite to verify that:
- Normal flow: CREATE_EVAL_RUN event is published
- Resume flow: CREATE_EVAL_RUN event is NOT published (preventing duplicates)
- UPDATE_EVAL_RUN continues to work in all scenarios
- Complete suspend/resume lifecycle operates correctly

Tests cover:
- Successful execution with CREATE_EVAL_RUN
- Suspend execution with CREATE_EVAL_RUN
- Resume skipping CREATE_EVAL_RUN
- Resume still publishing UPDATE_EVAL_RUN
- No duplicate entries on resume
- Complete suspend-then-resume lifecycle

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add type: ignore[method-assign] comments to fix mypy errors when
mocking EventBus.publish method in tests. This is a common testing
pattern where we replace methods with mocks.

Fixes 3 mypy errors:
- Line 141: event_bus fixture
- Line 309: suspend phase event bus
- Line 338: resume phase event bus

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@Chibionos Chibionos force-pushed the fix/duplicate-eval-runs-on-resume branch from f2a00d0 to 8c7bbcf Compare January 22, 2026 17:53
@Chibionos Chibionos merged commit de03766 into main Jan 22, 2026
89 checks passed
@Chibionos Chibionos deleted the fix/duplicate-eval-runs-on-resume branch January 22, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants