Skip to content

fix(eval): emit string justification for coded evaluators#1709

Draft
ameyjain wants to merge 1 commit into
mainfrom
fix/evaluator-justification-string
Draft

fix(eval): emit string justification for coded evaluators#1709
ameyjain wants to merge 1 commit into
mainfrom
fix/evaluator-justification-string

Conversation

@ameyjain

Copy link
Copy Markdown
Contributor

Problem

Coded evaluators (tool-call count/args/order/output, json-similarity) returned a null justification in their score output, while the LLM-judge evaluators returned a populated one.

Root cause: each coded evaluator stored its explanation under a per-evaluator key (explained_tool_calls_count, explained_tool_calls_args, explained_tool_calls_outputs, lcs, matched_leaves/total_leaves) and never under a justification key. The downstream eval worker reads details["justification"], which therefore always resolved to null — even though the structured detail was present.

Fix

Add a computed justification: str field to each of the five coded justification models, derived from the model's existing structured detail via a shared format_explained_tool_calls helper. model_dump() now emits a string justification for every coded evaluator, matching LLMJudgeJustification, without changing any of the structured fields.

Evaluator justification now derived from
tool-call-count explained_tool_calls_count
tool-call-args explained_tool_calls_args
tool-call-output explained_tool_calls_outputs
tool-call-order lcs
json-similarity matched_leaves / total_leaves

The base BaseEvaluatorJustification is intentionally left untouched: adding a computed justification there collides with LLMJudgeJustification's real justification field (pydantic raises TypeError). So the computed field lives on each coded subclass instead.

Verification

  • pytest tests/evaluators tests/cli/eval — all pass
  • mypy — clean
  • ruff check + ruff format — clean

Related

Pairs with a python-eval-worker change that stops flattening these structured justifications, so the full object (including this justification string) reaches the client as justificationObject for per-evaluator-type rendering.

Coded evaluators (tool-call count/args/output/order, json-similarity)
stored their explanation under per-evaluator keys (explained_tool_calls_*,
lcs, matched_leaves) with no 'justification' key, so the eval worker's
d.get('justification') always resolved to null while the structured detail
was still present.

Add a computed 'justification' string field to each coded justification
model, derived from its existing structured detail via a shared
format_explained_tool_calls helper. model_dump() now emits a string
justification for every evaluator, matching LLMJudgeJustification, without
changing the structured fields or the worker.
@github-actions github-actions Bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-integrations labels Jun 10, 2026
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
67.7% Coverage on New Code (required ≥ 90%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-integrations test:uipath-langchain Triggers tests in the uipath-langchain-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant