Skip to content

Add LogfireSink for pydantic-evals online evaluation integration#1804

Open
Kludex wants to merge 2 commits intomainfrom
feat/logfire-evaluation-sink
Open

Add LogfireSink for pydantic-evals online evaluation integration#1804
Kludex wants to merge 2 commits intomainfrom
feat/logfire-evaluation-sink

Conversation

@Kludex
Copy link
Copy Markdown
Member

@Kludex Kludex commented Mar 25, 2026

Summary

  • Add LogfireSink implementing pydantic-evals' EvaluationSink protocol, sending online eval results to the new /v1/annotations HTTP API
  • Add AnnotationsClient async HTTP client with retry on 5xx/timeout, using write token auth (same as OTLP ingest)
  • Auto-configure LogfireSink as the default sink in logfire.configure() when pydantic-evals is installed and a token is present
  • Add create_annotation() / create_annotation_sync() as user-facing HTTP-based annotation API
  • Deprecate raw_annotate_span() and record_feedback() in favor of the new HTTP path
  • Add logfire-api stubs for new modules

Depends on the platform /v1/annotations endpoint (can be developed in parallel, must release after backend).
Based on dmontagu/online-eval-capability branch in pydantic-ai for the EvaluationSink protocol.

See plan.local.md for the full design context.

Test plan

  • Unit tests for AnnotationsClient (auth header, retry on 5xx, no retry on 4xx, close)
  • Unit tests for LogfireSink (result/failure serialization, idempotency keys, None span_reference no-op, exception catching)
  • Unit tests for create_annotation() API
  • Existing test_annotations.py updated to handle deprecation warnings
  • All pyright/ruff checks pass

🤖 Generated with Claude Code


Summary by cubic

Adds LogfireSink to send pydantic-evals online evaluation results to Logfire via the new /v1/annotations HTTP API. Also adds a simple annotations API for manual feedback and auto-configures the sink during logfire.configure() when a token is set.

  • New Features

    • LogfireSink implements EvaluationSink, serializes values (assertion/score/label), includes failures, and uses deterministic idempotency keys.
    • AnnotationsClient async HTTP client with write-token auth and one retry on 5xx/timeout.
    • New HTTP helpers: logfire.experimental.annotations_api.create_annotation() and create_annotation_sync().
  • Migration

    • raw_annotate_span() and record_feedback() are deprecated; use the new HTTP APIs.
    • Depends on the platform /v1/annotations endpoint; release after backend is live.

Written for commit 489a62a. Summary will update on new commits.

Introduce `LogfireSink` that implements the `EvaluationSink` protocol from
pydantic-evals, sending online evaluation results to the Logfire annotations
HTTP API. Auto-configured by `logfire.configure()` when pydantic-evals is
installed.

- `AnnotationsClient`: async HTTP client for `POST /v1/annotations` with
  retry on 5xx/timeout
- `LogfireSink`: maps `EvaluationResult`/`EvaluatorFailure` to annotation
  payloads with idempotency keys
- `create_annotation()` / `create_annotation_sync()`: user-facing HTTP-based
  annotation API as a non-OTEL alternative to `record_feedback()`
- Deprecate `raw_annotate_span()` and `record_feedback()` in favor of the
  new HTTP-based API

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 25, 2026

Deploying logfire-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 30009c4
Status: ✅  Deploy successful!
Preview URL: https://ee59f0da.logfire-docs.pages.dev
Branch Preview URL: https://feat-logfire-evaluation-sink.logfire-docs.pages.dev

View logs

Comment on lines +12 to +30
class AnnotationsClient:
"""Async HTTP client for the Logfire annotations API.

Uses write tokens (same as ingest) for authentication.
"""

def __init__(
self,
base_url: str,
token: str,
timeout: httpx.Timeout = DEFAULT_TIMEOUT,
*,
client: httpx.AsyncClient | None = None,
) -> None:
self._client = client or httpx.AsyncClient(
base_url=base_url,
headers={'Authorization': token},
timeout=timeout,
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, shouldn't we be using the current LogfireAPIClient under experimental?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please merge with that, hopefully we can benefit from streamlining of API keys etc. too

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 potential issues.

View 1 additional finding in Devin Review.

Open in Devin Review

Comment on lines +99 to +112
asyncio.run(
create_annotation(
trace_id=trace_id,
span_id=span_id,
name=name,
value=value,
annotation_type=annotation_type,
comment=comment,
source=source,
source_name=source_name,
idempotency_key=idempotency_key,
metadata=metadata,
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 create_annotation_sync crashes with RuntimeError when called from a running event loop

create_annotation_sync() uses asyncio.run() at logfire/experimental/annotations_api.py:99, which raises RuntimeError('asyncio.run() cannot be called from a running event loop') when there is already an active event loop. This is a common scenario in async web frameworks (FastAPI, Starlette, Django async views, etc.), making the sync wrapper unusable in many real-world contexts where users would most want a sync convenience function.

Prompt for agents
In logfire/experimental/annotations_api.py, the `create_annotation_sync()` function at line 99 uses `asyncio.run()` which fails when a running event loop already exists. Replace this with a pattern that handles both cases: when there's no running loop (use `asyncio.run()`) and when there is a running loop (e.g., schedule the coroutine on the existing loop from a thread, or use a synchronous HTTP client like `httpx.Client` instead of the async one). One robust approach is to create a separate sync code path using `httpx.Client` (not `httpx.AsyncClient`) so that `create_annotation_sync` doesn't need asyncio at all.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +615 to +642
if config.send_to_logfire and config.token:
_try_configure_online_evals(config.token, config.advanced)

return logfire_instance


def _try_configure_online_evals(token: str | list[str], advanced: AdvancedOptions | None) -> None:
"""Auto-configure pydantic-evals LogfireSink if pydantic-evals is installed."""
try:
_online_mod = __import__('pydantic_evals.online', fromlist=['DEFAULT_CONFIG'])
except ImportError:
return # pydantic-evals not installed

evals_config: Any = _online_mod.DEFAULT_CONFIG

# Don't override if the user has already configured a sink
if evals_config.default_sink is not None:
return

from logfire._internal.annotations_client import AnnotationsClient
from logfire.experimental.evaluation import LogfireSink

write_token = token[0] if isinstance(token, list) else token
base_url = advanced.base_url if advanced and advanced.base_url else get_base_url_from_token(write_token)

client = AnnotationsClient(base_url=base_url, token=write_token)
sink = LogfireSink(client=client)
evals_config.default_sink = sink
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 _try_configure_online_evals silently mutates global pydantic-evals config as a side effect of logfire.configure()

At logfire/_internal/config.py:615-616, _try_configure_online_evals is called during configure(), which sets evals_config.default_sink on the pydantic-evals global config (DEFAULT_CONFIG). This is a side effect that may surprise users — calling logfire.configure() now silently changes pydantic-evals behavior. If logfire.configure() is called multiple times (e.g., in tests), each call creates a new AnnotationsClient and overwrites the sink, potentially leaking the previous client since it's never closed. The guard at line 631 (if evals_config.default_sink is not None: return) only prevents overwrite if a sink was already set, but after the first logfire.configure() call, the sink IS set, so subsequent calls won't update it even if the token/base_url changes.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 11 files

Confidence score: 3/5

  • There is some merge risk because logfire/_internal/config.py has a concrete behavior regression: later configure() calls may not update token/base URL when an auto-installed sink is present, which can cause configuration changes to be silently ignored.
  • logfire/experimental/annotations.py introduces duplicate deprecation warnings in record_feedback(), which is user-facing noise but likely not functionally blocking.
  • logfire/_internal/annotations_client.py logs the wrong exception on retry failure, which can mislead debugging and incident triage even if runtime behavior is otherwise unchanged.
  • Pay close attention to logfire/_internal/config.py, logfire/experimental/annotations.py, and logfire/_internal/annotations_client.py - sink reconfiguration, warning duplication, and retry error reporting should be validated before merge.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="logfire/experimental/annotations.py">

<violation number="1" location="logfire/experimental/annotations.py:83">
P2: Calling `record_feedback()` now emits duplicate deprecation warnings because it warns itself and then calls `raw_annotate_span()`, which also warns.</violation>
</file>

<file name="logfire/_internal/annotations_client.py">

<violation number="1" location="logfire/_internal/annotations_client.py:43">
P2: The retry-failure log uses the first request exception instead of the retry exception, so it can report the wrong error cause.</violation>
</file>

<file name="logfire/_internal/config.py">

<violation number="1" location="logfire/_internal/config.py:631">
P2: The sink guard is too broad: it prevents reconfiguring an auto-installed LogfireSink on later `configure()` calls, so token/base URL changes are ignored.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

evals_config: Any = _online_mod.DEFAULT_CONFIG

# Don't override if the user has already configured a sink
if evals_config.default_sink is not None:
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The sink guard is too broad: it prevents reconfiguring an auto-installed LogfireSink on later configure() calls, so token/base URL changes are ignored.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At logfire/_internal/config.py, line 631:

<comment>The sink guard is too broad: it prevents reconfiguring an auto-installed LogfireSink on later `configure()` calls, so token/base URL changes are ignored.</comment>

<file context>
@@ -612,9 +612,36 @@ def configure(
+    evals_config: Any = _online_mod.DEFAULT_CONFIG
+
+    # Don't override if the user has already configured a sink
+    if evals_config.default_sink is not None:
+        return
+
</file context>
Fix with Cubic

…logging, and null comment

- Extract _raw_annotate_span_impl to avoid duplicate deprecation warnings when
  record_feedback() calls raw_annotate_span()
- Bind retry exception properly in annotations_client retry handler
- Only include comment in failure annotations when error_stacktrace is present
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

response = await self._client.post('/v1/annotations', json={'annotations': annotations})
response.raise_for_status()
except Exception as retry_exc:
logfire.error('Annotations batch retry failed: {error}', _exc_info=retry_exc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing error attribute in logfire.error call causes message formatting to fail

The logfire.error call at line 44 uses _exc_info=retry_exc but does not pass the error attribute needed by the {error} template placeholder. The _exc_info parameter is a special named parameter captured separately from **attributes (see logfire/_internal/main.py:464-494), so it does not populate an error key in the attributes dict. The formatter will fail to resolve {error}, triggering a KnownFormattingError that falls back to using the raw template string 'Annotations batch retry failed: {error}' as the message — losing the actual error details.

The correct pattern (used elsewhere in the codebase at logfire/variables/remote.py:166) is to pass both the attribute and _exc_info: logfire.error('...{error}', error=str(retry_exc), _exc_info=retry_exc).

Suggested change
logfire.error('Annotations batch retry failed: {error}', _exc_info=retry_exc)
logfire.error('Annotations batch retry failed: {error}', error=str(retry_exc), _exc_info=retry_exc)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants