Add LogfireSink for pydantic-evals online evaluation integration#1804
Add LogfireSink for pydantic-evals online evaluation integration#1804
Conversation
Introduce `LogfireSink` that implements the `EvaluationSink` protocol from pydantic-evals, sending online evaluation results to the Logfire annotations HTTP API. Auto-configured by `logfire.configure()` when pydantic-evals is installed. - `AnnotationsClient`: async HTTP client for `POST /v1/annotations` with retry on 5xx/timeout - `LogfireSink`: maps `EvaluationResult`/`EvaluatorFailure` to annotation payloads with idempotency keys - `create_annotation()` / `create_annotation_sync()`: user-facing HTTP-based annotation API as a non-OTEL alternative to `record_feedback()` - Deprecate `raw_annotate_span()` and `record_feedback()` in favor of the new HTTP-based API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deploying logfire-docs with
|
| Latest commit: |
30009c4
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://ee59f0da.logfire-docs.pages.dev |
| Branch Preview URL: | https://feat-logfire-evaluation-sink.logfire-docs.pages.dev |
| class AnnotationsClient: | ||
| """Async HTTP client for the Logfire annotations API. | ||
|
|
||
| Uses write tokens (same as ingest) for authentication. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| base_url: str, | ||
| token: str, | ||
| timeout: httpx.Timeout = DEFAULT_TIMEOUT, | ||
| *, | ||
| client: httpx.AsyncClient | None = None, | ||
| ) -> None: | ||
| self._client = client or httpx.AsyncClient( | ||
| base_url=base_url, | ||
| headers={'Authorization': token}, | ||
| timeout=timeout, | ||
| ) |
There was a problem hiding this comment.
Instead of this, shouldn't we be using the current LogfireAPIClient under experimental?
There was a problem hiding this comment.
yes please merge with that, hopefully we can benefit from streamlining of API keys etc. too
| asyncio.run( | ||
| create_annotation( | ||
| trace_id=trace_id, | ||
| span_id=span_id, | ||
| name=name, | ||
| value=value, | ||
| annotation_type=annotation_type, | ||
| comment=comment, | ||
| source=source, | ||
| source_name=source_name, | ||
| idempotency_key=idempotency_key, | ||
| metadata=metadata, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
🔴 create_annotation_sync crashes with RuntimeError when called from a running event loop
create_annotation_sync() uses asyncio.run() at logfire/experimental/annotations_api.py:99, which raises RuntimeError('asyncio.run() cannot be called from a running event loop') when there is already an active event loop. This is a common scenario in async web frameworks (FastAPI, Starlette, Django async views, etc.), making the sync wrapper unusable in many real-world contexts where users would most want a sync convenience function.
Prompt for agents
In logfire/experimental/annotations_api.py, the `create_annotation_sync()` function at line 99 uses `asyncio.run()` which fails when a running event loop already exists. Replace this with a pattern that handles both cases: when there's no running loop (use `asyncio.run()`) and when there is a running loop (e.g., schedule the coroutine on the existing loop from a thread, or use a synchronous HTTP client like `httpx.Client` instead of the async one). One robust approach is to create a separate sync code path using `httpx.Client` (not `httpx.AsyncClient`) so that `create_annotation_sync` doesn't need asyncio at all.
Was this helpful? React with 👍 or 👎 to provide feedback.
| if config.send_to_logfire and config.token: | ||
| _try_configure_online_evals(config.token, config.advanced) | ||
|
|
||
| return logfire_instance | ||
|
|
||
|
|
||
| def _try_configure_online_evals(token: str | list[str], advanced: AdvancedOptions | None) -> None: | ||
| """Auto-configure pydantic-evals LogfireSink if pydantic-evals is installed.""" | ||
| try: | ||
| _online_mod = __import__('pydantic_evals.online', fromlist=['DEFAULT_CONFIG']) | ||
| except ImportError: | ||
| return # pydantic-evals not installed | ||
|
|
||
| evals_config: Any = _online_mod.DEFAULT_CONFIG | ||
|
|
||
| # Don't override if the user has already configured a sink | ||
| if evals_config.default_sink is not None: | ||
| return | ||
|
|
||
| from logfire._internal.annotations_client import AnnotationsClient | ||
| from logfire.experimental.evaluation import LogfireSink | ||
|
|
||
| write_token = token[0] if isinstance(token, list) else token | ||
| base_url = advanced.base_url if advanced and advanced.base_url else get_base_url_from_token(write_token) | ||
|
|
||
| client = AnnotationsClient(base_url=base_url, token=write_token) | ||
| sink = LogfireSink(client=client) | ||
| evals_config.default_sink = sink |
There was a problem hiding this comment.
🚩 _try_configure_online_evals silently mutates global pydantic-evals config as a side effect of logfire.configure()
At logfire/_internal/config.py:615-616, _try_configure_online_evals is called during configure(), which sets evals_config.default_sink on the pydantic-evals global config (DEFAULT_CONFIG). This is a side effect that may surprise users — calling logfire.configure() now silently changes pydantic-evals behavior. If logfire.configure() is called multiple times (e.g., in tests), each call creates a new AnnotationsClient and overwrites the sink, potentially leaking the previous client since it's never closed. The guard at line 631 (if evals_config.default_sink is not None: return) only prevents overwrite if a sink was already set, but after the first logfire.configure() call, the sink IS set, so subsequent calls won't update it even if the token/base_url changes.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
3 issues found across 11 files
Confidence score: 3/5
- There is some merge risk because
logfire/_internal/config.pyhas a concrete behavior regression: laterconfigure()calls may not update token/base URL when an auto-installed sink is present, which can cause configuration changes to be silently ignored. logfire/experimental/annotations.pyintroduces duplicate deprecation warnings inrecord_feedback(), which is user-facing noise but likely not functionally blocking.logfire/_internal/annotations_client.pylogs the wrong exception on retry failure, which can mislead debugging and incident triage even if runtime behavior is otherwise unchanged.- Pay close attention to
logfire/_internal/config.py,logfire/experimental/annotations.py, andlogfire/_internal/annotations_client.py- sink reconfiguration, warning duplication, and retry error reporting should be validated before merge.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="logfire/experimental/annotations.py">
<violation number="1" location="logfire/experimental/annotations.py:83">
P2: Calling `record_feedback()` now emits duplicate deprecation warnings because it warns itself and then calls `raw_annotate_span()`, which also warns.</violation>
</file>
<file name="logfire/_internal/annotations_client.py">
<violation number="1" location="logfire/_internal/annotations_client.py:43">
P2: The retry-failure log uses the first request exception instead of the retry exception, so it can report the wrong error cause.</violation>
</file>
<file name="logfire/_internal/config.py">
<violation number="1" location="logfire/_internal/config.py:631">
P2: The sink guard is too broad: it prevents reconfiguring an auto-installed LogfireSink on later `configure()` calls, so token/base URL changes are ignored.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| evals_config: Any = _online_mod.DEFAULT_CONFIG | ||
|
|
||
| # Don't override if the user has already configured a sink | ||
| if evals_config.default_sink is not None: |
There was a problem hiding this comment.
P2: The sink guard is too broad: it prevents reconfiguring an auto-installed LogfireSink on later configure() calls, so token/base URL changes are ignored.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At logfire/_internal/config.py, line 631:
<comment>The sink guard is too broad: it prevents reconfiguring an auto-installed LogfireSink on later `configure()` calls, so token/base URL changes are ignored.</comment>
<file context>
@@ -612,9 +612,36 @@ def configure(
+ evals_config: Any = _online_mod.DEFAULT_CONFIG
+
+ # Don't override if the user has already configured a sink
+ if evals_config.default_sink is not None:
+ return
+
</file context>
…logging, and null comment - Extract _raw_annotate_span_impl to avoid duplicate deprecation warnings when record_feedback() calls raw_annotate_span() - Bind retry exception properly in annotations_client retry handler - Only include comment in failure annotations when error_stacktrace is present
| response = await self._client.post('/v1/annotations', json={'annotations': annotations}) | ||
| response.raise_for_status() | ||
| except Exception as retry_exc: | ||
| logfire.error('Annotations batch retry failed: {error}', _exc_info=retry_exc) |
There was a problem hiding this comment.
🟡 Missing error attribute in logfire.error call causes message formatting to fail
The logfire.error call at line 44 uses _exc_info=retry_exc but does not pass the error attribute needed by the {error} template placeholder. The _exc_info parameter is a special named parameter captured separately from **attributes (see logfire/_internal/main.py:464-494), so it does not populate an error key in the attributes dict. The formatter will fail to resolve {error}, triggering a KnownFormattingError that falls back to using the raw template string 'Annotations batch retry failed: {error}' as the message — losing the actual error details.
The correct pattern (used elsewhere in the codebase at logfire/variables/remote.py:166) is to pass both the attribute and _exc_info: logfire.error('...{error}', error=str(retry_exc), _exc_info=retry_exc).
| logfire.error('Annotations batch retry failed: {error}', _exc_info=retry_exc) | |
| logfire.error('Annotations batch retry failed: {error}', error=str(retry_exc), _exc_info=retry_exc) |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
LogfireSinkimplementing pydantic-evals'EvaluationSinkprotocol, sending online eval results to the new/v1/annotationsHTTP APIAnnotationsClientasync HTTP client with retry on 5xx/timeout, using write token auth (same as OTLP ingest)LogfireSinkas the default sink inlogfire.configure()when pydantic-evals is installed and a token is presentcreate_annotation()/create_annotation_sync()as user-facing HTTP-based annotation APIraw_annotate_span()andrecord_feedback()in favor of the new HTTP pathDepends on the platform
/v1/annotationsendpoint (can be developed in parallel, must release after backend).Based on
dmontagu/online-eval-capabilitybranch in pydantic-ai for theEvaluationSinkprotocol.See
plan.local.mdfor the full design context.Test plan
AnnotationsClient(auth header, retry on 5xx, no retry on 4xx, close)LogfireSink(result/failure serialization, idempotency keys, None span_reference no-op, exception catching)create_annotation()APItest_annotations.pyupdated to handle deprecation warnings🤖 Generated with Claude Code
Summary by cubic
Adds
LogfireSinkto sendpydantic-evalsonline evaluation results to Logfire via the new/v1/annotationsHTTP API. Also adds a simple annotations API for manual feedback and auto-configures the sink duringlogfire.configure()when a token is set.New Features
LogfireSinkimplementsEvaluationSink, serializes values (assertion/score/label), includes failures, and uses deterministic idempotency keys.AnnotationsClientasync HTTP client with write-token auth and one retry on 5xx/timeout.logfire.experimental.annotations_api.create_annotation()andcreate_annotation_sync().Migration
raw_annotate_span()andrecord_feedback()are deprecated; use the new HTTP APIs./v1/annotationsendpoint; release after backend is live.Written for commit 489a62a. Summary will update on new commits.