-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[MLOB-7510] session-level documentation 2 #37123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,69 +1,86 @@ | ||||||||||
| --- | ||||||||||
| title: Prompt Templating | ||||||||||
| description: Reference for the templating used in custom LLM-as-a-judge evaluation prompts—variables, array operators, span filters, and resolution rules. | ||||||||||
| description: Reference for the templating used in custom LLM-as-a-judge evaluation prompts—variables, array operators, span and trace filters, session paths, and resolution rules. | ||||||||||
| further_reading: | ||||||||||
| - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations" | ||||||||||
| tag: "Documentation" | ||||||||||
| text: "Custom LLM-as-a-Judge Evaluations" | ||||||||||
| - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations" | ||||||||||
| tag: "Documentation" | ||||||||||
| text: "Session-Level Evaluations" | ||||||||||
| - link: "/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations" | ||||||||||
| tag: "Documentation" | ||||||||||
| text: "Trace-Level Evaluations" | ||||||||||
| --- | ||||||||||
|
|
||||||||||
| Custom LLM-as-a-judge prompts inject span or trace data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. | ||||||||||
| Custom LLM-as-a-judge prompts inject session, trace, or span data into the {{< ui >}}User{{< /ui >}} message by wrapping a field path in `{{ ... }}`. The System Prompt holds the static instructions to the LLM judge and does not resolve placeholders. The same syntax works in both the test pane and at evaluation time. Which paths are available depends on the evaluation scope you choose—session, trace, or span. | ||||||||||
|
|
||||||||||
| ## At a glance | ||||||||||
|
|
||||||||||
| | Pattern | Description | | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth a quick sanity check: the |
||||||||||
| |---|---| | ||||||||||
| | `{{name}}` | Direct field | | ||||||||||
| | `{{meta.input.value}}` | Dot notation for nested fields | | ||||||||||
| | `{{meta.input.messages[0].content}}` | Array index (0-based) | | ||||||||||
| | `{{meta.input.messages[1,3].content}}` | Inclusive array range | | ||||||||||
| | `{{meta.input.messages[*].content}}` | Array wildcard (fan-out) | | ||||||||||
| | `{{meta.input.messages.content}}` | Implicit fan-out (same as `[*]`) | | ||||||||||
| | `{{span_input}}`, `{{span_output}}` | Span-scope aliases | | ||||||||||
| | `{{traces}}` | Every trace in the session as JSON (session scope) | | ||||||||||
| | `{{traces[0].spans[0].meta.input.value}}` | First span of the first trace (session scope) | | ||||||||||
| | `{{traces[*].spans[*].name}}` | Fan-out across traces and spans (session scope) | | ||||||||||
| | `{{traces[meta.span.kind:llm].spans[*].meta.output.value}}` | Filter spans by attribute across a session (session scope) | | ||||||||||
| | `{{spans}}` | Every span in the trace as JSON (trace scope) | | ||||||||||
| | `{{spans[0].name}}` | Pick one span from a trace (trace scope) | | ||||||||||
| | `{{spans[name:my-span].meta.input.value}}` | Filter spans by attribute (trace scope) | | ||||||||||
| | `{{spans}}` | Every span in the trace as JSON (trace scope) | | ||||||||||
| | `{{*}}` | Entire span or trace payload as JSON | | ||||||||||
| | `{{name}}` | Direct field (span scope) | | ||||||||||
| | `{{meta.input.value}}` | Dot notation for nested fields (span scope) | | ||||||||||
| | `{{meta.input.messages[0].content}}` | Array index (0-based) (span scope) | | ||||||||||
| | `{{meta.input.messages[1,3].content}}` | Inclusive array range (span scope) | | ||||||||||
| | `{{meta.input.messages[*].content}}` | Array wildcard (fan-out) (span scope) | | ||||||||||
| | `{{meta.input.messages.content}}` | Implicit fan-out (same as `[*]`) (span scope) | | ||||||||||
| | `{{span_input}}`, `{{span_output}}` | Span-scope aliases | | ||||||||||
| | `{{*}}` | Entire session, trace, or span payload as JSON | | ||||||||||
|
Comment on lines
+35
to
+36
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Two rows break the
Suggested change
|
||||||||||
|
|
||||||||||
| The autocomplete dropdown opens after you type `{{` and lists fields available on the selected sample. | ||||||||||
|
|
||||||||||
| ## Span-scope syntax | ||||||||||
| ## Session-scope syntax | ||||||||||
|
|
||||||||||
| Span-scope evaluations expose a single span per evaluation. Reference fields by their JSON path on the span. | ||||||||||
| Session-scope evaluations expose every trace in the [user session][1] under the `traces` array. Each trace includes its own `spans` array, so you can read across traces and spans in one prompt. Use `{{traces...}}` paths (and nested `{{traces...].spans...}}` paths) to build session-level judges. The `{{span_input}}` and `{{span_output}}` aliases are not available in session scope. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The bracket notation
Suggested change
|
||||||||||
|
|
||||||||||
| ### Built-in aliases | ||||||||||
| Session-level evaluations require spans to be tagged with a `session_id`. See [Tracking user sessions][1] to instrument your application. A session is considered complete after **30 minutes** of inactivity (no new spans for that session, measured from the most recent span); the evaluation runs once at that point with every trace and span from the session. Spans that arrive more than 30 minutes after the previous span are not included. See [Session-Level Evaluations][2] for configuration, example prompts, and when to choose session scope over trace or span scope. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The session lifecycle mechanics (30-min timeout, late-span behavior) are conceptual details that belong on the Session-Level Evaluations page, not a syntax reference. Suggest trimming to just the prerequisite and the cross-ref:
Suggested change
|
||||||||||
|
|
||||||||||
| | Alias | Resolves to | | ||||||||||
| |---|---| | ||||||||||
| | `{{span_input}}` | `meta.input.messages[*].content` for LLM spans, `meta.input.value` otherwise | | ||||||||||
| | `{{span_output}}` | `meta.output.messages[*].content` for LLM spans, `meta.output.value` otherwise | | ||||||||||
| ### Reference the whole session | ||||||||||
|
|
||||||||||
| The aliases adapt to the kind of span being evaluated, so you don't have to branch on whether the span is an LLM call or an agent step. | ||||||||||
| ``` | ||||||||||
| {{traces}} # JSON of every trace in the session (each trace includes its spans) | ||||||||||
| {{*}} # Entire session payload as JSON, including top-level metadata | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Direct field paths | ||||||||||
| ### Pick a trace or span by index | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| {{name}} | ||||||||||
| {{meta.input.value}} | ||||||||||
| {{meta.output.value}} | ||||||||||
| {{metrics.input_tokens}} | ||||||||||
| {{traces[0].spans[0].meta.input.value}} # First span of the first trace | ||||||||||
| {{traces[*].spans[*].name}} # Newline-joined names of every span in the session | ||||||||||
| {{traces[1].spans}} # JSON of every span in the second trace | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Array access | ||||||||||
| ### Filter traces or spans by attribute | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A few subsections have their explanatory text in the wrong place or missing:
The self-evident sections — "Reference the whole session/trace" and "Pick by index" — read fine without lead-ins. |
||||||||||
|
|
||||||||||
| ``` | ||||||||||
| {{meta.input.messages[0].content}} # First message only | ||||||||||
| {{meta.input.messages[*].content}} # All messages, joined with newlines | ||||||||||
| {{meta.input.messages[0,2].content}} # Inclusive range; out-of-bounds ends are clamped | ||||||||||
| {{meta.input.messages.content}} # Implicit fan-out, equivalent to [*] | ||||||||||
| {{traces[0].spans[name:my-span].meta.input.value}} | ||||||||||
| {{traces[*].spans[meta.span.kind:llm].meta.output.value}} | ||||||||||
| {{traces[meta.span.kind:llm].spans[*].meta.output.value}} | ||||||||||
| {{traces[meta.span.kind:tool].spans[*].meta.input.parameters}} | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| `[field.path:value]` on `traces` keeps only traces whose field at `field.path` equals `value`. The same filter syntax on `spans` (within a trace path) keeps only matching spans. Combine filters and deeper paths to extract inputs or outputs across the session. Filters fall back to an empty string when nothing matches. | ||||||||||
|
|
||||||||||
| ### Fan-out across traces | ||||||||||
|
|
||||||||||
| Use `[*]` on `traces` or `spans` the same way as in trace scope: values from every matching trace or span are collected and joined with newlines (`\n`), or serialized as JSON when the resolved values are objects. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "The same way as in trace scope" is a backwards reference — session-scope now appears before trace-scope, so the reader hasn't seen it yet. The sentence also immediately explains the behavior anyway, so the phrase adds nothing.
Suggested change
|
||||||||||
|
|
||||||||||
| ``` | ||||||||||
| {{traces[meta.span.kind:llm].meta.input.messages[*].content}} | ||||||||||
| {{traces[meta.span.kind:llm].meta.output.messages[*].content}} | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Trace-scope syntax | ||||||||||
|
|
||||||||||
| Trace-scope evaluations expose every span in the trace under the `spans` array. Use `{{spans...}}` paths to read across spans. The `{{span_input}}` and `{{span_output}}` aliases are not available in trace scope. | ||||||||||
| Trace-scope evaluations expose every span in the trace under the `spans` array. Use `{{spans...}}` paths to read across spans. The `{{span_input}}` and `{{span_output}}` aliases are not available in trace scope. See [Trace-Level Evaluations][3] for configuration, example prompts, and when to choose trace scope. | ||||||||||
|
|
||||||||||
| ### Reference the whole trace | ||||||||||
|
|
||||||||||
|
|
@@ -89,6 +106,37 @@ | |||||||||
|
|
||||||||||
| `[field.path:value]` keeps only the spans whose field at `field.path` equals `value`. Combine with deeper paths to extract the inputs or outputs of the matching spans. The filter falls back to an empty string if no span matches. | ||||||||||
|
|
||||||||||
| ## Span-scope syntax | ||||||||||
|
|
||||||||||
| Span-scope evaluations expose a single span per evaluation. Reference fields by their JSON path on the span. | ||||||||||
|
|
||||||||||
| ### Built-in aliases | ||||||||||
|
|
||||||||||
| | Alias | Resolves to | | ||||||||||
| |---|---| | ||||||||||
| | `{{span_input}}` | `meta.input.messages[*].content` for LLM spans, `meta.input.value` otherwise | | ||||||||||
| | `{{span_output}}` | `meta.output.messages[*].content` for LLM spans, `meta.output.value` otherwise | | ||||||||||
|
|
||||||||||
| The aliases adapt to the kind of span being evaluated, so you don't have to branch on whether the span is an LLM call or an agent step. | ||||||||||
|
|
||||||||||
| ### Direct field paths | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| {{name}} | ||||||||||
| {{meta.input.value}} | ||||||||||
| {{meta.output.value}} | ||||||||||
| {{metrics.input_tokens}} | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Array access | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| {{meta.input.messages[0].content}} # First message only | ||||||||||
| {{meta.input.messages[*].content}} # All messages, joined with newlines | ||||||||||
| {{meta.input.messages[0,2].content}} # Inclusive range; out-of-bounds ends are clamped | ||||||||||
| {{meta.input.messages.content}} # Implicit fan-out, equivalent to [*] | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Resolution rules | ||||||||||
|
|
||||||||||
| | Result | Behavior | | ||||||||||
|
|
@@ -118,11 +166,16 @@ | |||||||||
|
|
||||||||||
| ## Tips | ||||||||||
|
|
||||||||||
| - Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (span or trace) and to the sample selected on the right. | ||||||||||
| - Pick a sample row in the {{< ui >}}Filtered Spans{{< /ui >}} panel (span scope) or the {{< ui >}}Spans in Selected Trace{{< /ui >}} panel (trace scope), then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. | ||||||||||
| - Type `{{` in the prompt editor to open the autocomplete dropdown. The list adapts to the scope (session, trace, or span) and to the sample selected on the right. | ||||||||||
| - Pick a sample in the panel on the right—the sample session pane listing traces in the session (session scope), {{< ui >}}Spans in Selected Trace{{< /ui >}} (trace scope), or {{< ui >}}Filtered Spans{{< /ui >}} (span scope)—then click {{< ui >}}Test Evaluation{{< /ui >}} to preview how each placeholder resolves on real data before saving the configuration. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The session panel name is written as prose while the other two panel names in the same list use the
Suggested change
|
||||||||||
| - Use the three-dots menu on a sample's JSON view and select {{< ui >}}Add variable to message{{< /ui >}} to insert a field path into the prompt without typing it. | ||||||||||
| - Pass `{{*}}` when you want the LLM judge to see the full payload—useful for free-form prompts that decide for themselves which fields matter. | ||||||||||
| - Prefer `{{traces}}` or targeted `{{traces...].spans...}}` paths for session judges when you need cross-turn context; use `{{spans}}` when a single trace is enough. See [Session-Level Evaluations][2] for scope guidance and example prompts. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same mismatched bracket as line 42 —
Suggested change
|
||||||||||
|
|
||||||||||
| ## Further Reading | ||||||||||
|
|
||||||||||
| {{< partial name="whats-next/whats-next.html" >}} | ||||||||||
|
|
||||||||||
| [1]: /llm_observability/instrumentation/sdk/#tracking-user-sessions | ||||||||||
| [2]: /llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations | ||||||||||
| [3]: /llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor — "Which paths are available depends on..." is a bit indirect. Suggestion: