Skip to content

feat: blind A/B comparison with dynamic rubrics and post-comparison analysis#581

Merged
christso merged 1 commit intomainfrom
feat/571-blind-comparison
Mar 14, 2026
Merged

feat: blind A/B comparison with dynamic rubrics and post-comparison analysis#581
christso merged 1 commit intomainfrom
feat/571-blind-comparison

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Closes #571

Changes

  • New agent: blind-comparator.md — bias-free comparison with dynamic rubrics, N-way support
  • New agent: comparison-analyzer.md — post-comparison analysis with improvement suggestions

Key Features

  • Blind labeling (A, B, C...) prevents confirmation bias
  • Dynamic rubric generation adapts criteria to task type
  • N-way multi-provider comparison (not just binary)
  • Works with all evaluator types (code-judge, tool-trajectory, llm-judge, deterministic)
  • Multi-dimensional scoring (content + structure + evaluator + overall)
  • Post-comparison analysis explains why winner won with specific evidence
  • Categorized improvement suggestions (instructions, tools, examples, error_handling, structure, references) with priority levels
  • Compatible with skill-creator's comparator/analyzer JSON formats
  • Supports workspace evaluation data (file changes, build/test results, tool calls, multi-turn conversations)

…nalysis (#571)

Add blind-comparator and comparison-analyzer agents for bias-free
evaluation comparison with dynamic task-specific rubrics, N-way
provider support, and post-comparison analysis with improvement
suggestions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 891f644
Status: ✅  Deploy successful!
Preview URL: https://2f0f58fd.agentv.pages.dev
Branch Preview URL: https://feat-571-blind-comparison.agentv.pages.dev

View logs

@christso christso marked this pull request as ready for review March 14, 2026 05:36
@christso christso merged commit b64f0c4 into main Mar 14, 2026
1 check passed
@christso christso deleted the feat/571-blind-comparison branch March 14, 2026 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: blind A/B comparison with dynamic rubrics and post-comparison analysis

1 participant