Skip to content

chore: bump deps + adopt pydantic-ai 1.79 patterns#13

Merged
LukeMainwaring merged 4 commits into
mainfrom
update-deps/2026-04-10
Apr 10, 2026
Merged

chore: bump deps + adopt pydantic-ai 1.79 patterns#13
LukeMainwaring merged 4 commits into
mainfrom
update-deps/2026-04-10

Conversation

@LukeMainwaring

Copy link
Copy Markdown
Owner

Summary

  • Bumps backend + frontend dependencies to latest (pydantic-ai 1.77→1.79, next 16.2.2→16.2.3 with CVE-2026-23869 backport, axios 1.14→1.15 with two security fixes, ~12 other patch/minor bumps).
  • Adopts the pydantic-ai 1.79 patterns that fit this codebase — native history_processors= parameter, Hooks capability for tool-failure recovery, and a pydantic_evals scaffold for tool-routing regression tests.
  • Adds an optional AGENT_REASONING_EFFORT config knob (default off) wiring OpenAIResponsesModelSettings.openai_reasoning_effort through the agent, ready to experiment behind the new eval suite.

Changes

Dependency bumps (a82dd54)

  • Backend: pydantic-ai 1.77→1.79, ruff 0.15.9→0.15.10, plus transitive refreshes (anthropic 0.91→0.93, openai 2.30→2.31, huggingface-hub 1.9→1.10, temporalio 1.24→1.25, etc.)
  • Frontend: next 16.2.2→16.2.3, ai 6.0.151→6.0.156, @ai-sdk/react 3.0.153→3.0.158, @tanstack/react-query 5.96→5.97, react/react-dom 19.2.4→19.2.5, axios 1.14→1.15, lucide-react 1.7→1.8, @biomejs/biome 2.4.10→2.4.11, ultracite 7.4.3→7.4.4, plus postcss, @types/node
  • docs/pydantic-ai-llms-full.txt refreshed — the upstream restructured from tutorial-first to reference-first, so the file shrank from 168k→74k lines. No APIs deprecated; content moved to the web docs site. See .claude/rules/backend/pydantic-ai.md for details.

Agent pattern adoption (1f06f5b)

  • backend/src/cortexdj/agents/brain_agent.pysummarize_tool_results now registered via the native history_processors= Agent parameter instead of wrapped in HistoryProcessor(...) inside capabilities=[]. Matches every example in the updated upstream docs.
  • backend/src/cortexdj/agents/hooks.py (new) — build_brain_agent_hooks() returns a Hooks capability with on_tool_execute_error as a safety net for unanticipated tool exceptions (anticipated Spotify/DB errors are still handled inside the tools themselves). The hook returns a structured {"error": "tool_failed", ...} recovery payload so the agent can apologize conversationally instead of crashing the Vercel AI SDK stream mid-response.
  • backend/tests/evals/ (new directory):
    • test_prepare_tools.py — 8 deterministic TestModel-backed tests verifying PlaylistCapability.prepare_tools and ClassificationCapability.prepare_tools filter the offered tool set correctly when Spotify / EEG model are missing.
    • test_brain_agent_evals.py — 7 real-model tool-routing cases using pydantic_evals.Dataset/Case/Evaluator, gated behind @pytest.mark.eval.
    • backend/pyproject.toml adds pydantic-evals>=1.79.0 to dev deps, registers the eval marker, and sets addopts = "-m 'not eval'" so real-model tests are excluded from the default pytest run.
  • backend/src/cortexdj/core/config.py — adds AGENT_REASONING_EFFORT: Literal["low","medium","high"] | None = None. When set, brain_agent.py threads it through OpenAIResponsesModelSettings(openai_reasoning_effort=...) via the model_settings= parameter. Default None preserves current behavior.
  • .claude/rules/backend/pydantic-ai.md (new) — explains the upstream docs reorganization (llms-full.txt is now reference-only, tutorials live at ai.pydantic.dev) so future sessions don't grep for walkthroughs that have been moved out.
  • backend/tests/test_brain_agent_hooks.py (new) — 4 unit tests for the recovery payload helper and the _recover_tool_error handler using real ToolCallPart/ToolDefinition instances.

Code review fixes (c3c39a0)

  • hooks.py uses the public Hooks[AgentDeps](tool_execute_error=...) constructor form instead of the decorator, removing private-API access from tests.
  • Dummy OPENAI_API_KEY moved to session-level backend/tests/conftest.py (eliminates noqa: E402 gymnastics in the evals conftest).
  • tests/evals/conftest.py exposes fake_spotify_client() / fake_eeg_model() helpers via MagicMock(spec=...) so test intent is explicit and type-checked.
  • test_brain_agent_evals.pyBrainAgentInput now carries with_spotify, so build_relaxation_playlist_needs_confirmation runs with a fake Spotify client and actually tests the confirmation flow instead of passing for the wrong reason (disconnected-refusal).
  • brain_agent.py drops the explicit OpenAIResponsesModelSettings | None annotation (mypy infers it).
  • DEVELOPMENT.md documents the -m eval two-tier test workflow under the Tests section.

Breaking Changes

None. Existing behavior is preserved across the board:

  • history_processors= and HistoryProcessor(...) inside capabilities=[] are both valid; this PR swaps one for the other with no functional change.
  • Hooks is additive — on_tool_execute_error only fires on unhandled exceptions; tools that return structured error dicts work unchanged.
  • AGENT_REASONING_EFFORT defaults to None → no OpenAIResponsesModelSettings is constructed → agent behaves identically to before the PR unless the env var is explicitly set.
  • addopts = "-m 'not eval'" excludes new eval-marked tests from the default pytest invocation; all existing tests still run.

Test Plan

  • uv run --directory backend pytest → 57 passed, 1 deselected
  • uv run --directory backend pre-commit run --all-files → all hooks pass (ruff, ruff-format, mypy)
  • pnpm -C frontend lint → clean
  • code-reviewer agent review applied (should-fixes + nice-to-haves)
  • Manual smoke: `docker compose up -d` + `pnpm -C frontend dev`, send a 3-turn conversation invoking `analyze_session` + `build_mood_playlist` + `search_tracks` with a large result. Confirm UI streams cleanly and logfire spans show `summarize_tool_results` activity on turn ≥2.
  • Manual smoke: force an unhandled exception in a tool (e.g. add a temporary `raise RuntimeError` inside `get_my_playlists`), confirm the agent responds with an apology instead of crashing the stream.
  • Optional: `uv run --directory backend pytest -m eval tests/evals/` against real `gpt-5.4-mini` (requires `OPENAI_API_KEY`). Gated out of the default pytest run and CI; run manually or nightly.
  • Optional experiment: set `AGENT_REASONING_EFFORT=medium` in `.env`, re-run the eval suite, compare tool-routing pass rate vs. default. Revert if no improvement.

🤖 Generated with Claude Code

LukeMainwaring and others added 4 commits April 10, 2026 07:23
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…processors

Applies the approved adoption plan from the 1.77→1.79 docs review:

- Move summarize_tool_results to the native `history_processors=` Agent
  parameter instead of wrapping it in the HistoryProcessor capability —
  matches every example in the restructured upstream docs.
- Add `Hooks` capability with `on_tool_execute_error` as a safety net for
  unanticipated exceptions in tool bodies (anticipated Spotify/DB errors
  are still handled inside the tools themselves). Recovery payload lets
  the agent respond conversationally instead of crashing the SSE stream.
- Add `AGENT_REASONING_EFFORT` config knob wiring reasoning through
  `OpenAIResponsesModelSettings`. Defaults to None (no behavior change);
  enable via env var and validate against the eval suite before shipping.
- Scaffold pydantic-evals at backend/tests/evals/:
  * `test_prepare_tools.py` — 8 deterministic TestModel-backed tests
    verifying runtime tool filtering for Playlist/Classification caps.
  * `test_brain_agent_evals.py` — 7 real-model tool-routing cases gated
    behind `@pytest.mark.eval`, deselected from the default pytest run.
- Add `.claude/rules/backend/pydantic-ai.md` — flags that llms-full.txt
  is now reference-only (as of the 1.79 refresh) and guides are at the
  web docs site, so future Claude runs don't grep for walkthroughs that
  have been moved out of the local snapshot.

Test plan: `pytest` → 59 passed, 1 deselected. Pre-commit (ruff, mypy,
format) clean. Eval suite runs with `pytest -m eval` and requires a real
OPENAI_API_KEY — intentionally kept out of CI default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- hooks.py: switch to the public Hooks(tool_execute_error=...) constructor
  form; drops the decorator indirection for a single hook and removes the
  private `_get()` access from tests.
- test_brain_agent_hooks.py: replace _FakeCall/_FakeToolDef stubs with real
  ToolCallPart / ToolDefinition instances; tests now break if pydantic-ai
  renames those fields instead of silently drifting.
- tests/conftest.py: move the dummy OPENAI_API_KEY setup to the session-
  level root conftest so it runs once before any test imports brain_agent.
  Eval conftest no longer needs noqa: E402 gymnastics.
- tests/evals/conftest.py: add fake_spotify_client() and fake_eeg_model()
  helpers using MagicMock(spec=...) so intent is explicit and type-checked.
- test_prepare_tools.py: use the new MagicMock(spec=...) helpers; stops
  hinting that the capabilities "check truthiness" (they check `is None`).
- test_brain_agent_evals.py: BrainAgentInput now carries with_spotify so
  build_relaxation_playlist_needs_confirmation can run with a fake Spotify
  client — previously it was passing for the wrong reason (agent refusing
  because Spotify was disconnected, not because it was awaiting confirm).
- brain_agent.py: drop the explicit OpenAIResponsesModelSettings | None
  annotation; mypy infers it.
- DEVELOPMENT.md: document the -m eval two-tier test workflow under Tests.

pytest → 57 passed, 1 deselected. pre-commit clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The original cut was ~63 lines and mostly restated facts already in
CLAUDE.md or trivially grep-able from the code (capability class names,
sdk_version=6, logfire wiring, SUMMARIZABLE_TOOLS location). Those
bullets invite staleness — rename a capability, bump sdk_version, or
add a fifth capability and the rule rots silently.

Keep only the two things that aren't derivable from grepping:
1. The split between llms-full.txt (reference) and ai.pydantic.dev/
   (guides) — the one fact that saves future sessions from fruitlessly
   grepping for walkthroughs that have been moved upstream.
2. The policy that tool error handling should flow through the hook
   rather than in-tool try/except — a behavioral rule, not a code fact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LukeMainwaring LukeMainwaring merged commit 0d062fa into main Apr 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant