chore: bump deps + adopt pydantic-ai 1.79 patterns by LukeMainwaring · Pull Request #41 · LukeMainwaring/samplespace

LukeMainwaring · 2026-04-10T15:09:40Z

Summary

Bumps backend + frontend dependencies to latest (pydantic-ai 1.77→1.79, next 16.2.2→16.2.3 with CVE-2026-23869 backport, axios 1.14→1.15 with two security fixes, pytest 9.0.2→9.0.3 with CVE-2025-71176 fix, ~13 other patch/minor bumps).
Adopts the pydantic-ai 1.79 patterns from LukeMainwaring/cortexdj#13 that fit samplespace — a Hooks capability for tool-failure recovery and a pydantic_evals scaffold for tool-routing regression tests.
Adds an optional AGENT_REASONING_EFFORT config knob (default off) wiring OpenAIResponsesModelSettings.openai_reasoning_effort through the agent, ready to experiment behind the new eval suite.

Changes

Dependency bumps (32ee14d)

Backend: pydantic-ai 1.77→1.79, transformers 5.5.0→5.5.3, uvicorn 0.43→0.44, pytest 9.0.2→9.0.3, ruff 0.15.9→0.15.10.
Frontend: next 16.2.2→16.2.3, ai 6.0.146→6.0.156, @ai-sdk/react 3.0.148→3.0.158, @tanstack/react-query 5.96→5.97, react/react-dom 19.2.4→19.2.5, axios 1.14→1.15, lucide-react 1.7→1.8, @biomejs/biome 2.4.10→2.4.11, ultracite 7.4.3→7.4.4, postcss, @types/node, wavesurfer.js.
docs/pydantic-ai-llms-full.txt refreshed — upstream restructured from tutorial-first to reference-first, so the file shrank from 150k+→74.7k lines. No APIs deprecated; tutorial content moved to the web docs site. See .claude/rules/backend/pydantic-ai.md for details.

Agent pattern adoption (05b70e0)

backend/src/samplespace/agents/hooks.py (new) — build_sample_agent_hooks() returns a Hooks[AgentDeps](tool_execute_error=_recover_tool_error) capability as a safety net for unanticipated tool exceptions. The hook returns a plain-string recovery message (matching samplespace's existing tool return style) so the agent can apologize conversationally instead of crashing the Vercel AI SDK stream mid-response. Wired into sample_agent.py as a capability entry.
backend/src/samplespace/core/config.py — adds AGENT_REASONING_EFFORT: Literal["low","medium","high"] | None = None. When set, sample_agent.py threads it through OpenAIResponsesModelSettings(openai_reasoning_effort=...) via the model_settings= parameter. Default None preserves current behavior.
backend/tests/evals/ (new directory):
- test_prepare_tools.py — 4 deterministic TestModel-backed tests verifying SearchCapability.prepare_tools correctly filters the CNN-gated find_similar_samples tool based on whether cnn_model is loaded, and that all CLAP / upload / analysis / context / pairing / production tools stay offered regardless.
- test_sample_agent_evals.py — 7 real-model tool-routing cases using pydantic_evals.Dataset/Case/Evaluator (CLAP search, proactive set_song_context, CNN similarity, kit builder, pairing session, upload flow, CNN-hidden gating), gated behind @pytest.mark.eval.
- conftest.py — make_fake_deps() plus fake_clap_model() / fake_clap_processor() / fake_cnn_model() helpers via MagicMock(spec=...).
backend/tests/test_sample_agent_hooks.py (new) — 4 unit tests for the recovery message helper and the _recover_tool_error handler using real ToolCallPart / ToolDefinition instances.
backend/pyproject.toml adds pydantic-evals>=1.79.0 to dev deps, registers the eval marker, and sets addopts = "-m 'not eval'" so real-model tests are excluded from the default pytest run.
backend/tests/conftest.py — session-level dummy OPENAI_API_KEY so importing sample_agent during test collection doesn't trip OpenAIResponsesModel's eager client construction. Tests that actually invoke the model use agent.override with TestModel; this value never reaches a real API call.
.claude/rules/backend/pydantic-ai.md (new) — minimal rules file explaining (a) that llms-full.txt is now reference-only and tutorials live on the web, and (b) the tool-error-handling convention (let unanticipated exceptions propagate to the hook).
DEVELOPMENT.md — documents the -m eval two-tier test workflow under the Testing section.

Code review fixes (9fe1673)

pyproject.toml + DEVELOPMENT.md — dropped the unused main / additional pytest markers (zero tests referenced them) so the addopts = "-m 'not eval'" doesn't have a composition gotcha with aspirational marker invocations.
hooks.py — logger.exception → logger.error(..., exc_info=error) to avoid relying on pydantic-ai's implicit sys.exc_info() frame, and args: Any → args: dict[str, Any] to match the documented ValidatedToolArgs shape. Docstring reference updated from on_tool_execute_error to the constructor kwarg name tool_execute_error.
test_sample_agent_evals.py — ExpectedToolCalled / AnyOfToolsCalled / NoToolCalled now raise ValueError from __post_init__ when instantiated with empty defaults, so a forgotten argument fails loudly instead of silently matching nothing. Tool-call filter switched from hasattr("tool_name") and hasattr("args") to isinstance(part, ToolCallPart) for robustness against future part-shape changes.
sample_agent.py — reasoning-effort comment now explicitly references pytest -m eval tests/evals/ as the validation step and flags that it hits OpenAI.

Breaking Changes

None. Existing behavior is preserved across the board:

Hooks is additive — tool_execute_error only fires on unhandled exceptions; tools that currently wrap their bodies in try/except and return error strings work unchanged. The convention note in .claude/rules/backend/pydantic-ai.md is guidance for new tools, not a forced migration.
AGENT_REASONING_EFFORT defaults to None → no OpenAIResponsesModelSettings is constructed → agent behaves identically to before this PR unless the env var is explicitly set.
addopts = "-m 'not eval'" excludes new eval-marked tests from the default pytest invocation; all existing tests still run.

Test Plan

uv run --directory backend pytest → 62 passed, 1 deselected
uv run --directory backend pre-commit run --all-files → all hooks pass (ruff, ruff-format, mypy strict)
pnpm -C frontend lint → clean
code-reviewer agent review applied (should-fixes + nice-to-haves)
Manual smoke: docker compose up -d + pnpm -C frontend dev, send a multi-turn conversation invoking search_by_description + build_kit + a pairing session. Confirm UI streams cleanly and logfire spans show the new hook activity.
Manual smoke: force an unhandled exception in a tool (e.g. add a temporary raise RuntimeError inside search_by_description), confirm the agent responds with an apology instead of crashing the stream.
Optional: uv run --directory backend pytest -m eval tests/evals/ against real gpt-5.4-mini (requires OPENAI_API_KEY). Gated out of the default pytest run and CI; run manually or nightly.
Optional experiment: set AGENT_REASONING_EFFORT=medium in .env, re-run the eval suite, compare tool-routing pass rate vs. default. Revert if no improvement.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LukeMainwaring and others added 3 commits April 10, 2026 07:28

chore: bump all dependencies to latest versions

32ee14d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor with cortexdj changes as reference

05b70e0

pr feedback

9fe1673

LukeMainwaring merged commit 39e17cf into main Apr 10, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump deps + adopt pydantic-ai 1.79 patterns#41

chore: bump deps + adopt pydantic-ai 1.79 patterns#41
LukeMainwaring merged 3 commits into
mainfrom
update-deps/2026-04-10

LukeMainwaring commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LukeMainwaring commented Apr 10, 2026

Summary

Changes

Breaking Changes

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant