From 9b3a4778cd51e05a833e59b7722e25236ae5ff20 Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 29 May 2026 16:01:12 -0400 Subject: [PATCH 1/3] addressed Rashel's suggestions --- .../session_level_evaluations.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md index 3a9f18cea88..3811d2c0350 100644 --- a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md +++ b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations.md @@ -51,10 +51,6 @@ The walkthrough below highlights the parts of the configuration that are specifi 1. Pick a sample session from the panel on the right. The pane lists the traces in that session, with the fields referenced by your prompt highlighted. - {{< img src="llm_observability/evaluations/session_level_sample_session.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}} - - Clicking on a session then lists the traces in that session, with the fields referenced by your prompt highlighted. - {{< img src="llm_observability/evaluations/session_level_sample_session_trace_view.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}} From 1d2b4656b8f68f7918ae224ccefcfb328207b9b6 Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 29 May 2026 16:17:18 -0400 Subject: [PATCH 2/3] added traces to prompt_templating.md --- .../prompt_templating.md | 62 +++++++++++++++++-- 1 file changed, 57 insertions(+), 5 deletions(-) diff --git a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md index 2b251de5a89..687d6211e21 100644 --- a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md +++ b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md @@ -1,6 +1,6 @@ --- title: Prompt Templating -description: Reference for the templating used in custom LLM-as-a-judge evaluation prompts—variables, array operators, span filters, and resolution rules. +description: Reference for the templating used in custom LLM-as-a-judge evaluation prompts—variables, array operators, span and trace filters, session paths, and resolution rules. further_reading: - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations" tag: "Documentation" @@ -8,9 +8,12 @@ further_reading: - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations" tag: "Documentation" text: "Trace-Level Evaluations" +- link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations" + tag: "Documentation" + text: "Session-Level Evaluations" --- -Custom LLM-as-a-judge prompts inject span or trace data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. +Custom LLM-as-a-judge prompts inject span, trace, or session data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. Which paths are available depends on the evaluation scope you choose—span, trace, or session. ## At a glance @@ -26,7 +29,11 @@ Custom LLM-as-a-judge prompts inject span or trace data into the {{< ui >}}User{ | `{{spans[0].name}}` | Pick one span from a trace (trace scope) | | `{{spans[name:my-span].meta.input.value}}` | Filter spans by attribute (trace scope) | | `{{spans}}` | Every span in the trace as JSON (trace scope) | -| `{{*}}` | Entire span or trace payload as JSON | +| `{{traces[0].spans[0].meta.input.value}}` | First span of the first trace (session scope) | +| `{{traces[*].spans[*].name}}` | Fan-out across traces and spans (session scope) | +| `{{traces[meta.span.kind:llm].spans[*].meta.output.value}}` | Filter spans by attribute across a session (session scope) | +| `{{traces}}` | Every trace in the session as JSON (session scope) | +| `{{*}}` | Entire span, trace, or session payload as JSON | The autocomplete dropdown opens after you type `{{` and lists fields available on the selected sample. @@ -89,6 +96,47 @@ Trace-scope evaluations expose every span in the trace under the `spans` array. `[field.path:value]` keeps only the spans whose field at `field.path` equals `value`. Combine with deeper paths to extract the inputs or outputs of the matching spans. The filter falls back to an empty string if no span matches. +## Session-scope syntax + +Session-scope evaluations expose every trace in the [user session][1] under the `traces` array. Each trace includes its own `spans` array, so you can read across traces and spans in one prompt. Use `{{traces...}}` paths (and nested `{{traces...].spans...}}` paths) to build session-level judges. The `{{span_input}}` and `{{span_output}}` aliases are not available in session scope. + +Session-level evaluations require spans to be tagged with a `session_id`. See [Tracking user sessions][1] to instrument your application. A session is considered complete after **30 minutes** of inactivity (no new spans for that session, measured from the most recent span); the evaluation runs once at that point with every trace and span from the session. Spans that arrive more than 30 minutes after the previous span are not included. See [Session-Level Evaluations][2] for configuration, example prompts, and when to choose session scope over trace or span scope. + +### Reference the whole session + +``` +{{traces}} # JSON of every trace in the session (each trace includes its spans) +{{*}} # Entire session payload as JSON, including top-level metadata +``` + +### Pick a trace or span by index + +``` +{{traces[0].spans[0].meta.input.value}} # First span of the first trace +{{traces[*].spans[*].name}} # Newline-joined names of every span in the session +{{traces[1].spans}} # JSON of every span in the second trace +``` + +### Filter traces or spans by attribute + +``` +{{traces[0].spans[name:my-span].meta.input.value}} +{{traces[*].spans[meta.span.kind:llm].meta.output.value}} +{{traces[meta.span.kind:llm].spans[*].meta.output.value}} +{{traces[meta.span.kind:tool].spans[*].meta.input.parameters}} +``` + +`[field.path:value]` on `traces` keeps only traces whose field at `field.path` equals `value`. The same filter syntax on `spans` (within a trace path) keeps only matching spans. Combine filters and deeper paths to extract inputs or outputs across the session. Filters fall back to an empty string when nothing matches. + +### Fan-out across traces + +Use `[*]` on `traces` or `spans` the same way as in trace scope: values from every matching trace or span are collected and joined with newlines (`\n`), or serialized as JSON when the resolved values are objects. + +``` +{{traces[meta.span.kind:llm].meta.input.messages[*].content}} +{{traces[meta.span.kind:llm].meta.output.messages[*].content}} +``` + ## Resolution rules | Result | Behavior | @@ -118,11 +166,15 @@ For example, given a span where `meta.input.messages` is: ## Tips -- Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (span or trace) and to the sample selected on the right. -- Pick a sample row in the {{< ui >}}Filtered Spans{{< /ui >}} panel (span scope) or the {{< ui >}}Spans in Selected Trace{{< /ui >}} panel (trace scope), then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. +- Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (span, trace, or session) and to the sample selected on the right. +- Pick a sample in the panel on the right—{{< ui >}}Filtered Spans{{< /ui >}} (span scope), {{< ui >}}Spans in Selected Trace{{< /ui >}} (trace scope), or the sample session pane listing traces in the session (session scope)—then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. - Use the three-dots menu on a sample's JSON view and select {{< ui >}}Add variable to message{{< /ui >}} to insert a field path into the prompt without typing it. - Pass `{{*}}` when you want the LLM judge to see the full payload—useful for free-form prompts that decide for themselves which fields matter. +- Prefer `{{traces}}` or targeted `{{traces...].spans...}}` paths for session judges when you need cross-turn context; use `{{spans}}` when a single trace is enough. See [Session-Level Evaluations][2] for scope guidance and example prompts. ## Further Reading {{< partial name="whats-next/whats-next.html" >}} + +[1]: /llm_observability/instrumentation/sdk/#tracking-user-sessions +[2]: /llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations From 2f693a92c284b08816b68ad35772ad85ba887720 Mon Sep 17 00:00:00 2001 From: Jennifer Mickel Date: Fri, 29 May 2026 17:22:53 -0400 Subject: [PATCH 3/3] doc pr addressing Rashel's further comments --- config/_default/menus/main.en.yaml | 18 +-- .../prompt_templating.md | 131 +++++++++--------- 2 files changed, 75 insertions(+), 74 deletions(-) diff --git a/config/_default/menus/main.en.yaml b/config/_default/menus/main.en.yaml index 97bb1b1efbf..4c0b3b397dd 100644 --- a/config/_default/menus/main.en.yaml +++ b/config/_default/menus/main.en.yaml @@ -2098,7 +2098,7 @@ menu: parent: platform_heading identifier: internal_developer_portal weight: 110000 - - name: Catalog + - name: Catalog url: internal_developer_portal/catalog/ parent: internal_developer_portal identifier: catalog @@ -4551,7 +4551,7 @@ menu: parent: tracing identifier: tracing_services weight: 9 - - name: Catalog + - name: Catalog url: /internal_developer_portal/catalog/ parent: tracing_services identifier: tracing_software_catalog @@ -5484,21 +5484,21 @@ menu: parent: llm_obs_custom_llm_as_a_judge_evaluations identifier: llm_obs_custom_llm_as_a_judge_evaluations_template weight: 40101 - - name: Trace-Level Evaluations - url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations - parent: llm_obs_custom_llm_as_a_judge_evaluations - identifier: llm_obs_custom_llm_as_a_judge_evaluations_trace_level - weight: 40102 - name: Session-Level Evaluations url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations parent: llm_obs_custom_llm_as_a_judge_evaluations identifier: llm_obs_custom_llm_as_a_judge_evaluations_session_level - weight: 401021 + weight: 40102 + - name: Trace-Level Evaluations + url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations + parent: llm_obs_custom_llm_as_a_judge_evaluations + identifier: llm_obs_custom_llm_as_a_judge_evaluations_trace_level + weight: 40103 - name: Prompt Templating url: llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating parent: llm_obs_custom_llm_as_a_judge_evaluations identifier: llm_obs_custom_llm_as_a_judge_evaluations_prompt_templating - weight: 40103 + weight: 40104 - name: NeMo url: llm_observability/evaluations/submit_nemo_evaluations parent: llm_obs_external_evaluations diff --git a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md index 687d6211e21..d718c1ff4f3 100644 --- a/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md +++ b/content/en/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating.md @@ -5,72 +5,82 @@ further_reading: - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations" tag: "Documentation" text: "Custom LLM-as-a-Judge Evaluations" -- link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations" - tag: "Documentation" - text: "Trace-Level Evaluations" - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations" tag: "Documentation" text: "Session-Level Evaluations" +- link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations" + tag: "Documentation" + text: "Trace-Level Evaluations" --- -Custom LLM-as-a-judge prompts inject span, trace, or session data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. Which paths are available depends on the evaluation scope you choose—span, trace, or session. +Custom LLM-as-a-judge prompts inject session, trace, or span data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. Which paths are available depends on the evaluation scope you choose—session, trace, or span. ## At a glance | Pattern | Description | |---|---| -| `{{name}}` | Direct field | -| `{{meta.input.value}}` | Dot notation for nested fields | -| `{{meta.input.messages[0].content}}` | Array index (0-based) | -| `{{meta.input.messages[1,3].content}}` | Inclusive array range | -| `{{meta.input.messages[*].content}}` | Array wildcard (fan-out) | -| `{{meta.input.messages.content}}` | Implicit fan-out (same as `[*]`) | -| `{{span_input}}`, `{{span_output}}` | Span-scope aliases | -| `{{spans[0].name}}` | Pick one span from a trace (trace scope) | -| `{{spans[name:my-span].meta.input.value}}` | Filter spans by attribute (trace scope) | -| `{{spans}}` | Every span in the trace as JSON (trace scope) | +| `{{traces}}` | Every trace in the session as JSON (session scope) | | `{{traces[0].spans[0].meta.input.value}}` | First span of the first trace (session scope) | | `{{traces[*].spans[*].name}}` | Fan-out across traces and spans (session scope) | | `{{traces[meta.span.kind:llm].spans[*].meta.output.value}}` | Filter spans by attribute across a session (session scope) | -| `{{traces}}` | Every trace in the session as JSON (session scope) | -| `{{*}}` | Entire span, trace, or session payload as JSON | +| `{{spans}}` | Every span in the trace as JSON (trace scope) | +| `{{spans[0].name}}` | Pick one span from a trace (trace scope) | +| `{{spans[name:my-span].meta.input.value}}` | Filter spans by attribute (trace scope) | +| `{{name}}` | Direct field (span scope) | +| `{{meta.input.value}}` | Dot notation for nested fields (span scope) | +| `{{meta.input.messages[0].content}}` | Array index (0-based) (span scope) | +| `{{meta.input.messages[1,3].content}}` | Inclusive array range (span scope) | +| `{{meta.input.messages[*].content}}` | Array wildcard (fan-out) (span scope) | +| `{{meta.input.messages.content}}` | Implicit fan-out (same as `[*]`) (span scope) | +| `{{span_input}}`, `{{span_output}}` | Span-scope aliases | +| `{{*}}` | Entire session, trace, or span payload as JSON | The autocomplete dropdown opens after you type `{{` and lists fields available on the selected sample. -## Span-scope syntax +## Session-scope syntax -Span-scope evaluations expose a single span per evaluation. Reference fields by their JSON path on the span. +Session-scope evaluations expose every trace in the [user session][1] under the `traces` array. Each trace includes its own `spans` array, so you can read across traces and spans in one prompt. Use `{{traces...}}` paths (and nested `{{traces...].spans...}}` paths) to build session-level judges. The `{{span_input}}` and `{{span_output}}` aliases are not available in session scope. -### Built-in aliases +Session-level evaluations require spans to be tagged with a `session_id`. See [Tracking user sessions][1] to instrument your application. A session is considered complete after **30 minutes** of inactivity (no new spans for that session, measured from the most recent span); the evaluation runs once at that point with every trace and span from the session. Spans that arrive more than 30 minutes after the previous span are not included. See [Session-Level Evaluations][2] for configuration, example prompts, and when to choose session scope over trace or span scope. -| Alias | Resolves to | -|---|---| -| `{{span_input}}` | `meta.input.messages[*].content` for LLM spans, `meta.input.value` otherwise | -| `{{span_output}}` | `meta.output.messages[*].content` for LLM spans, `meta.output.value` otherwise | +### Reference the whole session -The aliases adapt to the kind of span being evaluated, so you don't have to branch on whether the span is an LLM call or an agent step. +``` +{{traces}} # JSON of every trace in the session (each trace includes its spans) +{{*}} # Entire session payload as JSON, including top-level metadata +``` -### Direct field paths +### Pick a trace or span by index ``` -{{name}} -{{meta.input.value}} -{{meta.output.value}} -{{metrics.input_tokens}} +{{traces[0].spans[0].meta.input.value}} # First span of the first trace +{{traces[*].spans[*].name}} # Newline-joined names of every span in the session +{{traces[1].spans}} # JSON of every span in the second trace ``` -### Array access +### Filter traces or spans by attribute ``` -{{meta.input.messages[0].content}} # First message only -{{meta.input.messages[*].content}} # All messages, joined with newlines -{{meta.input.messages[0,2].content}} # Inclusive range; out-of-bounds ends are clamped -{{meta.input.messages.content}} # Implicit fan-out, equivalent to [*] +{{traces[0].spans[name:my-span].meta.input.value}} +{{traces[*].spans[meta.span.kind:llm].meta.output.value}} +{{traces[meta.span.kind:llm].spans[*].meta.output.value}} +{{traces[meta.span.kind:tool].spans[*].meta.input.parameters}} +``` + +`[field.path:value]` on `traces` keeps only traces whose field at `field.path` equals `value`. The same filter syntax on `spans` (within a trace path) keeps only matching spans. Combine filters and deeper paths to extract inputs or outputs across the session. Filters fall back to an empty string when nothing matches. + +### Fan-out across traces + +Use `[*]` on `traces` or `spans` the same way as in trace scope: values from every matching trace or span are collected and joined with newlines (`\n`), or serialized as JSON when the resolved values are objects. + +``` +{{traces[meta.span.kind:llm].meta.input.messages[*].content}} +{{traces[meta.span.kind:llm].meta.output.messages[*].content}} ``` ## Trace-scope syntax -Trace-scope evaluations expose every span in the trace under the `spans` array. Use `{{spans...}}` paths to read across spans. The `{{span_input}}` and `{{span_output}}` aliases are not available in trace scope. +Trace-scope evaluations expose every span in the trace under the `spans` array. Use `{{spans...}}` paths to read across spans. The `{{span_input}}` and `{{span_output}}` aliases are not available in trace scope. See [Trace-Level Evaluations][3] for configuration, example prompts, and when to choose trace scope. ### Reference the whole trace @@ -96,45 +106,35 @@ Trace-scope evaluations expose every span in the trace under the `spans` array. `[field.path:value]` keeps only the spans whose field at `field.path` equals `value`. Combine with deeper paths to extract the inputs or outputs of the matching spans. The filter falls back to an empty string if no span matches. -## Session-scope syntax - -Session-scope evaluations expose every trace in the [user session][1] under the `traces` array. Each trace includes its own `spans` array, so you can read across traces and spans in one prompt. Use `{{traces...}}` paths (and nested `{{traces...].spans...}}` paths) to build session-level judges. The `{{span_input}}` and `{{span_output}}` aliases are not available in session scope. - -Session-level evaluations require spans to be tagged with a `session_id`. See [Tracking user sessions][1] to instrument your application. A session is considered complete after **30 minutes** of inactivity (no new spans for that session, measured from the most recent span); the evaluation runs once at that point with every trace and span from the session. Spans that arrive more than 30 minutes after the previous span are not included. See [Session-Level Evaluations][2] for configuration, example prompts, and when to choose session scope over trace or span scope. +## Span-scope syntax -### Reference the whole session +Span-scope evaluations expose a single span per evaluation. Reference fields by their JSON path on the span. -``` -{{traces}} # JSON of every trace in the session (each trace includes its spans) -{{*}} # Entire session payload as JSON, including top-level metadata -``` +### Built-in aliases -### Pick a trace or span by index +| Alias | Resolves to | +|---|---| +| `{{span_input}}` | `meta.input.messages[*].content` for LLM spans, `meta.input.value` otherwise | +| `{{span_output}}` | `meta.output.messages[*].content` for LLM spans, `meta.output.value` otherwise | -``` -{{traces[0].spans[0].meta.input.value}} # First span of the first trace -{{traces[*].spans[*].name}} # Newline-joined names of every span in the session -{{traces[1].spans}} # JSON of every span in the second trace -``` +The aliases adapt to the kind of span being evaluated, so you don't have to branch on whether the span is an LLM call or an agent step. -### Filter traces or spans by attribute +### Direct field paths ``` -{{traces[0].spans[name:my-span].meta.input.value}} -{{traces[*].spans[meta.span.kind:llm].meta.output.value}} -{{traces[meta.span.kind:llm].spans[*].meta.output.value}} -{{traces[meta.span.kind:tool].spans[*].meta.input.parameters}} +{{name}} +{{meta.input.value}} +{{meta.output.value}} +{{metrics.input_tokens}} ``` -`[field.path:value]` on `traces` keeps only traces whose field at `field.path` equals `value`. The same filter syntax on `spans` (within a trace path) keeps only matching spans. Combine filters and deeper paths to extract inputs or outputs across the session. Filters fall back to an empty string when nothing matches. - -### Fan-out across traces - -Use `[*]` on `traces` or `spans` the same way as in trace scope: values from every matching trace or span are collected and joined with newlines (`\n`), or serialized as JSON when the resolved values are objects. +### Array access ``` -{{traces[meta.span.kind:llm].meta.input.messages[*].content}} -{{traces[meta.span.kind:llm].meta.output.messages[*].content}} +{{meta.input.messages[0].content}} # First message only +{{meta.input.messages[*].content}} # All messages, joined with newlines +{{meta.input.messages[0,2].content}} # Inclusive range; out-of-bounds ends are clamped +{{meta.input.messages.content}} # Implicit fan-out, equivalent to [*] ``` ## Resolution rules @@ -166,8 +166,8 @@ For example, given a span where `meta.input.messages` is: ## Tips -- Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (span, trace, or session) and to the sample selected on the right. -- Pick a sample in the panel on the right—{{< ui >}}Filtered Spans{{< /ui >}} (span scope), {{< ui >}}Spans in Selected Trace{{< /ui >}} (trace scope), or the sample session pane listing traces in the session (session scope)—then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. +- Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (session, trace, or span) and to the sample selected on the right. +- Pick a sample in the panel on the right—the sample session pane listing traces in the session (session scope), {{< ui >}}Spans in Selected Trace{{< /ui >}} (trace scope), or {{< ui >}}Filtered Spans{{< /ui >}} (span scope)—then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. - Use the three-dots menu on a sample's JSON view and select {{< ui >}}Add variable to message{{< /ui >}} to insert a field path into the prompt without typing it. - Pass `{{*}}` when you want the LLM judge to see the full payload—useful for free-form prompts that decide for themselves which fields matter. - Prefer `{{traces}}` or targeted `{{traces...].spans...}}` paths for session judges when you need cross-turn context; use `{{spans}}` when a single trace is enough. See [Session-Level Evaluations][2] for scope guidance and example prompts. @@ -178,3 +178,4 @@ For example, given a span where `meta.input.messages` is: [1]: /llm_observability/instrumentation/sdk/#tracking-user-sessions [2]: /llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations +[3]: /llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations \ No newline at end of file