diff --git a/config/_default/menus/main.en.yaml b/config/_default/menus/main.en.yaml index 97bb1b1efbf..4c0b3b397dd 100644 --- a/config/_default/menus/main.en.yaml +++ b/config/_default/menus/main.en.yaml @@ -2098,7 +2098,7 @@ menu: parent: platform_heading identifier: internal_developer_portal weight: 110000 - - name: Catalog + - name: Catalog url: internal_developer_portal/catalog/ parent: internal_developer_portal identifier: catalog @@ -4551,7 +4551,7 @@ menu: parent: tracing identifier: tracing_services weight: 9 - - name: Catalog + - name: Catalog url: /internal_developer_portal/catalog/ parent: tracing_services identifier: tracing_software_catalog @@ -5484,21 +5484,21 @@ menu: parent: llm_obs_custom_llm_as_a_judge_evaluations identifier: llm_obs_custom_llm_as_a_judge_evaluations_template weight: 40101 - - name: Trace-Level Evaluations - url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations - parent: llm_obs_custom_llm_as_a_judge_evaluations - identifier: llm_obs_custom_llm_as_a_judge_evaluations_trace_level - weight: 40102 - name: Session-Level Evaluations url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations parent: llm_obs_custom_llm_as_a_judge_evaluations identifier: llm_obs_custom_llm_as_a_judge_evaluations_session_level - weight: 401021 + weight: 40102 + - name: Trace-Level Evaluations + url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations + parent: llm_obs_custom_llm_as_a_judge_evaluations + identifier: llm_obs_custom_llm_as_a_judge_evaluations_trace_level + weight: 40103 - name: Prompt Templating url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating parent: llm_obs_custom_llm_as_a_judge_evaluations identifier: llm_obs_custom_llm_as_a_judge_evaluations_prompt_templating - weight: 40103 + weight: 40104 - name: NeMo url: llm_observability/evaluations/submit_nemo_evaluations parent: llm_obs_external_evaluations diff --git a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md index 2b251de5a89..d718c1ff4f3 100644 --- a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md +++ b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md @@ -1,69 +1,86 @@ --- title: Prompt Templating -description: Reference for the templating used in custom LLM-as-a-judge evaluation prompts—variables, array operators, span filters, and resolution rules. +description: Reference for the templating used in custom LLM-as-a-judge evaluation prompts—variables, array operators, span and trace filters, session paths, and resolution rules. further_reading: - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations" tag: "Documentation" text: "Custom LLM-as-a-Judge Evaluations" +- link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations" + tag: "Documentation" + text: "Session-Level Evaluations" - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations" tag: "Documentation" text: "Trace-Level Evaluations" --- -Custom LLM-as-a-judge prompts inject span or trace data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. +Custom LLM-as-a-judge prompts inject session, trace, or span data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. Which paths are available depends on the evaluation scope you choose—session, trace, or span. ## At a glance | Pattern | Description | |---|---| -| `{{name}}` | Direct field | -| `{{meta.input.value}}` | Dot notation for nested fields | -| `{{meta.input.messages[0].content}}` | Array index (0-based) | -| `{{meta.input.messages[1,3].content}}` | Inclusive array range | -| `{{meta.input.messages[*].content}}` | Array wildcard (fan-out) | -| `{{meta.input.messages.content}}` | Implicit fan-out (same as `[*]`) | -| `{{span_input}}`, `{{span_output}}` | Span-scope aliases | +| `{{traces}}` | Every trace in the session as JSON (session scope) | +| `{{traces[0].spans[0].meta.input.value}}` | First span of the first trace (session scope) | +| `{{traces[*].spans[*].name}}` | Fan-out across traces and spans (session scope) | +| `{{traces[meta.span.kind:llm].spans[*].meta.output.value}}` | Filter spans by attribute across a session (session scope) | +| `{{spans}}` | Every span in the trace as JSON (trace scope) | | `{{spans[0].name}}` | Pick one span from a trace (trace scope) | | `{{spans[name:my-span].meta.input.value}}` | Filter spans by attribute (trace scope) | -| `{{spans}}` | Every span in the trace as JSON (trace scope) | -| `{{*}}` | Entire span or trace payload as JSON | +| `{{name}}` | Direct field (span scope) | +| `{{meta.input.value}}` | Dot notation for nested fields (span scope) | +| `{{meta.input.messages[0].content}}` | Array index (0-based) (span scope) | +| `{{meta.input.messages[1,3].content}}` | Inclusive array range (span scope) | +| `{{meta.input.messages[*].content}}` | Array wildcard (fan-out) (span scope) | +| `{{meta.input.messages.content}}` | Implicit fan-out (same as `[*]`) (span scope) | +| `{{span_input}}`, `{{span_output}}` | Span-scope aliases | +| `{{*}}` | Entire session, trace, or span payload as JSON | The autocomplete dropdown opens after you type `{{` and lists fields available on the selected sample. -## Span-scope syntax +## Session-scope syntax -Span-scope evaluations expose a single span per evaluation. Reference fields by their JSON path on the span. +Session-scope evaluations expose every trace in the [user session][1] under the `traces` array. Each trace includes its own `spans` array, so you can read across traces and spans in one prompt. Use `{{traces...}}` paths (and nested `{{traces...].spans...}}` paths) to build session-level judges. The `{{span_input}}` and `{{span_output}}` aliases are not available in session scope. -### Built-in aliases +Session-level evaluations require spans to be tagged with a `session_id`. See [Tracking user sessions][1] to instrument your application. A session is considered complete after **30 minutes** of inactivity (no new spans for that session, measured from the most recent span); the evaluation runs once at that point with every trace and span from the session. Spans that arrive more than 30 minutes after the previous span are not included. See [Session-Level Evaluations][2] for configuration, example prompts, and when to choose session scope over trace or span scope. -| Alias | Resolves to | -|---|---| -| `{{span_input}}` | `meta.input.messages[*].content` for LLM spans, `meta.input.value` otherwise | -| `{{span_output}}` | `meta.output.messages[*].content` for LLM spans, `meta.output.value` otherwise | +### Reference the whole session -The aliases adapt to the kind of span being evaluated, so you don't have to branch on whether the span is an LLM call or an agent step. +``` +{{traces}} # JSON of every trace in the session (each trace includes its spans) +{{*}} # Entire session payload as JSON, including top-level metadata +``` -### Direct field paths +### Pick a trace or span by index ``` -{{name}} -{{meta.input.value}} -{{meta.output.value}} -{{metrics.input_tokens}} +{{traces[0].spans[0].meta.input.value}} # First span of the first trace +{{traces[*].spans[*].name}} # Newline-joined names of every span in the session +{{traces[1].spans}} # JSON of every span in the second trace ``` -### Array access +### Filter traces or spans by attribute ``` -{{meta.input.messages[0].content}} # First message only -{{meta.input.messages[*].content}} # All messages, joined with newlines -{{meta.input.messages[0,2].content}} # Inclusive range; out-of-bounds ends are clamped -{{meta.input.messages.content}} # Implicit fan-out, equivalent to [*] +{{traces[0].spans[name:my-span].meta.input.value}} +{{traces[*].spans[meta.span.kind:llm].meta.output.value}} +{{traces[meta.span.kind:llm].spans[*].meta.output.value}} +{{traces[meta.span.kind:tool].spans[*].meta.input.parameters}} +``` + +`[field.path:value]` on `traces` keeps only traces whose field at `field.path` equals `value`. The same filter syntax on `spans` (within a trace path) keeps only matching spans. Combine filters and deeper paths to extract inputs or outputs across the session. Filters fall back to an empty string when nothing matches. + +### Fan-out across traces + +Use `[*]` on `traces` or `spans` the same way as in trace scope: values from every matching trace or span are collected and joined with newlines (`\n`), or serialized as JSON when the resolved values are objects. + +``` +{{traces[meta.span.kind:llm].meta.input.messages[*].content}} +{{traces[meta.span.kind:llm].meta.output.messages[*].content}} ``` ## Trace-scope syntax -Trace-scope evaluations expose every span in the trace under the `spans` array. Use `{{spans...}}` paths to read across spans. The `{{span_input}}` and `{{span_output}}` aliases are not available in trace scope. +Trace-scope evaluations expose every span in the trace under the `spans` array. Use `{{spans...}}` paths to read across spans. The `{{span_input}}` and `{{span_output}}` aliases are not available in trace scope. See [Trace-Level Evaluations][3] for configuration, example prompts, and when to choose trace scope. ### Reference the whole trace @@ -89,6 +106,37 @@ Trace-scope evaluations expose every span in the trace under the `spans` array. `[field.path:value]` keeps only the spans whose field at `field.path` equals `value`. Combine with deeper paths to extract the inputs or outputs of the matching spans. The filter falls back to an empty string if no span matches. +## Span-scope syntax + +Span-scope evaluations expose a single span per evaluation. Reference fields by their JSON path on the span. + +### Built-in aliases + +| Alias | Resolves to | +|---|---| +| `{{span_input}}` | `meta.input.messages[*].content` for LLM spans, `meta.input.value` otherwise | +| `{{span_output}}` | `meta.output.messages[*].content` for LLM spans, `meta.output.value` otherwise | + +The aliases adapt to the kind of span being evaluated, so you don't have to branch on whether the span is an LLM call or an agent step. + +### Direct field paths + +``` +{{name}} +{{meta.input.value}} +{{meta.output.value}} +{{metrics.input_tokens}} +``` + +### Array access + +``` +{{meta.input.messages[0].content}} # First message only +{{meta.input.messages[*].content}} # All messages, joined with newlines +{{meta.input.messages[0,2].content}} # Inclusive range; out-of-bounds ends are clamped +{{meta.input.messages.content}} # Implicit fan-out, equivalent to [*] +``` + ## Resolution rules | Result | Behavior | @@ -118,11 +166,16 @@ For example, given a span where `meta.input.messages` is: ## Tips -- Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (span or trace) and to the sample selected on the right. -- Pick a sample row in the {{< ui >}}Filtered Spans{{< /ui >}} panel (span scope) or the {{< ui >}}Spans in Selected Trace{{< /ui >}} panel (trace scope), then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. +- Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (session, trace, or span) and to the sample selected on the right. +- Pick a sample in the panel on the right—the sample session pane listing traces in the session (session scope), {{< ui >}}Spans in Selected Trace{{< /ui >}} (trace scope), or {{< ui >}}Filtered Spans{{< /ui >}} (span scope)—then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. - Use the three-dots menu on a sample's JSON view and select {{< ui >}}Add variable to message{{< /ui >}} to insert a field path into the prompt without typing it. - Pass `{{*}}` when you want the LLM judge to see the full payload—useful for free-form prompts that decide for themselves which fields matter. +- Prefer `{{traces}}` or targeted `{{traces...].spans...}}` paths for session judges when you need cross-turn context; use `{{spans}}` when a single trace is enough. See [Session-Level Evaluations][2] for scope guidance and example prompts. ## Further Reading {{< partial name="whats-next/whats-next.html" >}} + +[1]: /llm_observability/instrumentation/sdk/#tracking-user-sessions +[2]: /llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations +[3]: /llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations \ No newline at end of file diff --git a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md index 3a9f18cea88..3811d2c0350 100644 --- a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md +++ b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md @@ -51,10 +51,6 @@ The walkthrough below highlights the parts of the configuration that are specifi 1. Pick a sample session from the panel on the right. The pane lists the traces in that session, with the fields referenced by your prompt highlighted. - {{< img src="llm_observability/evaluations/session_level_sample_session.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}} - - Clicking on a session then lists the traces in that session, with the fields referenced by your prompt highlighted. - {{< img src="llm_observability/evaluations/session_level_sample_session_trace_view.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}}