Replies: 1 comment
-
|
Some ideas on how to evaluate: Evaluation Strategies1. Pre-defined EvaluatorsWe provide a set of pre-defined evaluators that users can select and apply to their agents.
2. User-Defined Evaluators (Console LLM-as-judge)Users define custom evaluators by modifying prompts and context via placeholders in the AMP Console. Span-level Evaluators:
Trace-level Evaluators:
3. User Import Evaluators (Code-First)For complex logic, we provide an SDK-driven approach allowing users to write and import custom evaluation scripts. Process of Writing Evaluator Scripts:
Importing Evaluator Scripts to PlatformOnce the script is written and verified, it must be registered to the AMP Platform to automate the evaluation workflow.
Key Considerations:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Agents and AI-driven workflows are inherently non-deterministic. Without a structured, measurable evaluation process, teams cannot:
While multiple evaluation frameworks exist today, integrating them into real production agents—or embedding them naturally into the software development lifecycle—is still clumsy, fragmented, and largely manual. Also it can be risky to integrate evaluation into the production workloads. This is where a platform must step in: to make agent evaluation a first-class, repeatable capability that works seamlessly both during development and in production.
What We’re Looking For
We want concrete ideas around how evaluations should work end-to-end on the platform, not just individual features.
Define a clear, opinionated process for running evaluations:
Design a UX that makes evaluation usable:
Clarify the evaluator surface area:
Define where evaluations can run:
Beta Was this translation helpful? Give feedback.
All reactions