Skip to content

chore: bump deps + adopt pydantic-ai 1.79 patterns#41

Merged
LukeMainwaring merged 3 commits into
mainfrom
update-deps/2026-04-10
Apr 10, 2026
Merged

chore: bump deps + adopt pydantic-ai 1.79 patterns#41
LukeMainwaring merged 3 commits into
mainfrom
update-deps/2026-04-10

Conversation

@LukeMainwaring

Copy link
Copy Markdown
Owner

Summary

  • Bumps backend + frontend dependencies to latest (pydantic-ai 1.77→1.79, next 16.2.2→16.2.3 with CVE-2026-23869 backport, axios 1.14→1.15 with two security fixes, pytest 9.0.2→9.0.3 with CVE-2025-71176 fix, ~13 other patch/minor bumps).
  • Adopts the pydantic-ai 1.79 patterns from LukeMainwaring/cortexdj#13 that fit samplespace — a Hooks capability for tool-failure recovery and a pydantic_evals scaffold for tool-routing regression tests.
  • Adds an optional AGENT_REASONING_EFFORT config knob (default off) wiring OpenAIResponsesModelSettings.openai_reasoning_effort through the agent, ready to experiment behind the new eval suite.

Changes

Dependency bumps (32ee14d)

  • Backend: pydantic-ai 1.77→1.79, transformers 5.5.0→5.5.3, uvicorn 0.43→0.44, pytest 9.0.2→9.0.3, ruff 0.15.9→0.15.10.
  • Frontend: next 16.2.2→16.2.3, ai 6.0.146→6.0.156, @ai-sdk/react 3.0.148→3.0.158, @tanstack/react-query 5.96→5.97, react/react-dom 19.2.4→19.2.5, axios 1.14→1.15, lucide-react 1.7→1.8, @biomejs/biome 2.4.10→2.4.11, ultracite 7.4.3→7.4.4, postcss, @types/node, wavesurfer.js.
  • docs/pydantic-ai-llms-full.txt refreshed — upstream restructured from tutorial-first to reference-first, so the file shrank from 150k+→74.7k lines. No APIs deprecated; tutorial content moved to the web docs site. See .claude/rules/backend/pydantic-ai.md for details.

Agent pattern adoption (05b70e0)

  • backend/src/samplespace/agents/hooks.py (new) — build_sample_agent_hooks() returns a Hooks[AgentDeps](tool_execute_error=_recover_tool_error) capability as a safety net for unanticipated tool exceptions. The hook returns a plain-string recovery message (matching samplespace's existing tool return style) so the agent can apologize conversationally instead of crashing the Vercel AI SDK stream mid-response. Wired into sample_agent.py as a capability entry.
  • backend/src/samplespace/core/config.py — adds AGENT_REASONING_EFFORT: Literal["low","medium","high"] | None = None. When set, sample_agent.py threads it through OpenAIResponsesModelSettings(openai_reasoning_effort=...) via the model_settings= parameter. Default None preserves current behavior.
  • backend/tests/evals/ (new directory):
    • test_prepare_tools.py — 4 deterministic TestModel-backed tests verifying SearchCapability.prepare_tools correctly filters the CNN-gated find_similar_samples tool based on whether cnn_model is loaded, and that all CLAP / upload / analysis / context / pairing / production tools stay offered regardless.
    • test_sample_agent_evals.py — 7 real-model tool-routing cases using pydantic_evals.Dataset/Case/Evaluator (CLAP search, proactive set_song_context, CNN similarity, kit builder, pairing session, upload flow, CNN-hidden gating), gated behind @pytest.mark.eval.
    • conftest.pymake_fake_deps() plus fake_clap_model() / fake_clap_processor() / fake_cnn_model() helpers via MagicMock(spec=...).
  • backend/tests/test_sample_agent_hooks.py (new) — 4 unit tests for the recovery message helper and the _recover_tool_error handler using real ToolCallPart / ToolDefinition instances.
  • backend/pyproject.toml adds pydantic-evals>=1.79.0 to dev deps, registers the eval marker, and sets addopts = "-m 'not eval'" so real-model tests are excluded from the default pytest run.
  • backend/tests/conftest.py — session-level dummy OPENAI_API_KEY so importing sample_agent during test collection doesn't trip OpenAIResponsesModel's eager client construction. Tests that actually invoke the model use agent.override with TestModel; this value never reaches a real API call.
  • .claude/rules/backend/pydantic-ai.md (new) — minimal rules file explaining (a) that llms-full.txt is now reference-only and tutorials live on the web, and (b) the tool-error-handling convention (let unanticipated exceptions propagate to the hook).
  • DEVELOPMENT.md — documents the -m eval two-tier test workflow under the Testing section.

Code review fixes (9fe1673)

  • pyproject.toml + DEVELOPMENT.md — dropped the unused main / additional pytest markers (zero tests referenced them) so the addopts = "-m 'not eval'" doesn't have a composition gotcha with aspirational marker invocations.
  • hooks.pylogger.exceptionlogger.error(..., exc_info=error) to avoid relying on pydantic-ai's implicit sys.exc_info() frame, and args: Anyargs: dict[str, Any] to match the documented ValidatedToolArgs shape. Docstring reference updated from on_tool_execute_error to the constructor kwarg name tool_execute_error.
  • test_sample_agent_evals.pyExpectedToolCalled / AnyOfToolsCalled / NoToolCalled now raise ValueError from __post_init__ when instantiated with empty defaults, so a forgotten argument fails loudly instead of silently matching nothing. Tool-call filter switched from hasattr("tool_name") and hasattr("args") to isinstance(part, ToolCallPart) for robustness against future part-shape changes.
  • sample_agent.py — reasoning-effort comment now explicitly references pytest -m eval tests/evals/ as the validation step and flags that it hits OpenAI.

Breaking Changes

None. Existing behavior is preserved across the board:

  • Hooks is additive — tool_execute_error only fires on unhandled exceptions; tools that currently wrap their bodies in try/except and return error strings work unchanged. The convention note in .claude/rules/backend/pydantic-ai.md is guidance for new tools, not a forced migration.
  • AGENT_REASONING_EFFORT defaults to None → no OpenAIResponsesModelSettings is constructed → agent behaves identically to before this PR unless the env var is explicitly set.
  • addopts = "-m 'not eval'" excludes new eval-marked tests from the default pytest invocation; all existing tests still run.

Test Plan

  • uv run --directory backend pytest → 62 passed, 1 deselected
  • uv run --directory backend pre-commit run --all-files → all hooks pass (ruff, ruff-format, mypy strict)
  • pnpm -C frontend lint → clean
  • code-reviewer agent review applied (should-fixes + nice-to-haves)
  • Manual smoke: docker compose up -d + pnpm -C frontend dev, send a multi-turn conversation invoking search_by_description + build_kit + a pairing session. Confirm UI streams cleanly and logfire spans show the new hook activity.
  • Manual smoke: force an unhandled exception in a tool (e.g. add a temporary raise RuntimeError inside search_by_description), confirm the agent responds with an apology instead of crashing the stream.
  • Optional: uv run --directory backend pytest -m eval tests/evals/ against real gpt-5.4-mini (requires OPENAI_API_KEY). Gated out of the default pytest run and CI; run manually or nightly.
  • Optional experiment: set AGENT_REASONING_EFFORT=medium in .env, re-run the eval suite, compare tool-routing pass rate vs. default. Revert if no improvement.

🤖 Generated with Claude Code

@LukeMainwaring LukeMainwaring merged commit 39e17cf into main Apr 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant