chore: bump deps + adopt pydantic-ai 1.79 patterns by LukeMainwaring · Pull Request #13 · LukeMainwaring/cortexdj

LukeMainwaring · 2026-04-10T14:34:06Z

Summary

Bumps backend + frontend dependencies to latest (pydantic-ai 1.77→1.79, next 16.2.2→16.2.3 with CVE-2026-23869 backport, axios 1.14→1.15 with two security fixes, ~12 other patch/minor bumps).
Adopts the pydantic-ai 1.79 patterns that fit this codebase — native history_processors= parameter, Hooks capability for tool-failure recovery, and a pydantic_evals scaffold for tool-routing regression tests.
Adds an optional AGENT_REASONING_EFFORT config knob (default off) wiring OpenAIResponsesModelSettings.openai_reasoning_effort through the agent, ready to experiment behind the new eval suite.

Changes

Dependency bumps (a82dd54)

Backend: pydantic-ai 1.77→1.79, ruff 0.15.9→0.15.10, plus transitive refreshes (anthropic 0.91→0.93, openai 2.30→2.31, huggingface-hub 1.9→1.10, temporalio 1.24→1.25, etc.)
Frontend: next 16.2.2→16.2.3, ai 6.0.151→6.0.156, @ai-sdk/react 3.0.153→3.0.158, @tanstack/react-query 5.96→5.97, react/react-dom 19.2.4→19.2.5, axios 1.14→1.15, lucide-react 1.7→1.8, @biomejs/biome 2.4.10→2.4.11, ultracite 7.4.3→7.4.4, plus postcss, @types/node
docs/pydantic-ai-llms-full.txt refreshed — the upstream restructured from tutorial-first to reference-first, so the file shrank from 168k→74k lines. No APIs deprecated; content moved to the web docs site. See .claude/rules/backend/pydantic-ai.md for details.

Agent pattern adoption (1f06f5b)

backend/src/cortexdj/agents/brain_agent.py — summarize_tool_results now registered via the native history_processors= Agent parameter instead of wrapped in HistoryProcessor(...) inside capabilities=[]. Matches every example in the updated upstream docs.
backend/src/cortexdj/agents/hooks.py (new) — build_brain_agent_hooks() returns a Hooks capability with on_tool_execute_error as a safety net for unanticipated tool exceptions (anticipated Spotify/DB errors are still handled inside the tools themselves). The hook returns a structured {"error": "tool_failed", ...} recovery payload so the agent can apologize conversationally instead of crashing the Vercel AI SDK stream mid-response.
backend/tests/evals/ (new directory):
- test_prepare_tools.py — 8 deterministic TestModel-backed tests verifying PlaylistCapability.prepare_tools and ClassificationCapability.prepare_tools filter the offered tool set correctly when Spotify / EEG model are missing.
- test_brain_agent_evals.py — 7 real-model tool-routing cases using pydantic_evals.Dataset/Case/Evaluator, gated behind @pytest.mark.eval.
- backend/pyproject.toml adds pydantic-evals>=1.79.0 to dev deps, registers the eval marker, and sets addopts = "-m 'not eval'" so real-model tests are excluded from the default pytest run.
backend/src/cortexdj/core/config.py — adds AGENT_REASONING_EFFORT: Literal["low","medium","high"] | None = None. When set, brain_agent.py threads it through OpenAIResponsesModelSettings(openai_reasoning_effort=...) via the model_settings= parameter. Default None preserves current behavior.
.claude/rules/backend/pydantic-ai.md (new) — explains the upstream docs reorganization (llms-full.txt is now reference-only, tutorials live at ai.pydantic.dev) so future sessions don't grep for walkthroughs that have been moved out.
backend/tests/test_brain_agent_hooks.py (new) — 4 unit tests for the recovery payload helper and the _recover_tool_error handler using real ToolCallPart/ToolDefinition instances.

Code review fixes (c3c39a0)

hooks.py uses the public Hooks[AgentDeps](tool_execute_error=...) constructor form instead of the decorator, removing private-API access from tests.
Dummy OPENAI_API_KEY moved to session-level backend/tests/conftest.py (eliminates noqa: E402 gymnastics in the evals conftest).
tests/evals/conftest.py exposes fake_spotify_client() / fake_eeg_model() helpers via MagicMock(spec=...) so test intent is explicit and type-checked.
test_brain_agent_evals.py — BrainAgentInput now carries with_spotify, so build_relaxation_playlist_needs_confirmation runs with a fake Spotify client and actually tests the confirmation flow instead of passing for the wrong reason (disconnected-refusal).
brain_agent.py drops the explicit OpenAIResponsesModelSettings | None annotation (mypy infers it).
DEVELOPMENT.md documents the -m eval two-tier test workflow under the Tests section.

Breaking Changes

None. Existing behavior is preserved across the board:

history_processors= and HistoryProcessor(...) inside capabilities=[] are both valid; this PR swaps one for the other with no functional change.
Hooks is additive — on_tool_execute_error only fires on unhandled exceptions; tools that return structured error dicts work unchanged.
AGENT_REASONING_EFFORT defaults to None → no OpenAIResponsesModelSettings is constructed → agent behaves identically to before the PR unless the env var is explicitly set.
addopts = "-m 'not eval'" excludes new eval-marked tests from the default pytest invocation; all existing tests still run.

Test Plan

uv run --directory backend pytest → 57 passed, 1 deselected
uv run --directory backend pre-commit run --all-files → all hooks pass (ruff, ruff-format, mypy)
pnpm -C frontend lint → clean
code-reviewer agent review applied (should-fixes + nice-to-haves)
Manual smoke: `docker compose up -d` + `pnpm -C frontend dev`, send a 3-turn conversation invoking `analyze_session` + `build_mood_playlist` + `search_tracks` with a large result. Confirm UI streams cleanly and logfire spans show `summarize_tool_results` activity on turn ≥2.
Manual smoke: force an unhandled exception in a tool (e.g. add a temporary `raise RuntimeError` inside `get_my_playlists`), confirm the agent responds with an apology instead of crashing the stream.
Optional: `uv run --directory backend pytest -m eval tests/evals/` against real `gpt-5.4-mini` (requires `OPENAI_API_KEY`). Gated out of the default pytest run and CI; run manually or nightly.
Optional experiment: set `AGENT_REASONING_EFFORT=medium` in `.env`, re-run the eval suite, compare tool-routing pass rate vs. default. Revert if no improvement.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…processors Applies the approved adoption plan from the 1.77→1.79 docs review: - Move summarize_tool_results to the native `history_processors=` Agent parameter instead of wrapping it in the HistoryProcessor capability — matches every example in the restructured upstream docs. - Add `Hooks` capability with `on_tool_execute_error` as a safety net for unanticipated exceptions in tool bodies (anticipated Spotify/DB errors are still handled inside the tools themselves). Recovery payload lets the agent respond conversationally instead of crashing the SSE stream. - Add `AGENT_REASONING_EFFORT` config knob wiring reasoning through `OpenAIResponsesModelSettings`. Defaults to None (no behavior change); enable via env var and validate against the eval suite before shipping. - Scaffold pydantic-evals at backend/tests/evals/: * `test_prepare_tools.py` — 8 deterministic TestModel-backed tests verifying runtime tool filtering for Playlist/Classification caps. * `test_brain_agent_evals.py` — 7 real-model tool-routing cases gated behind `@pytest.mark.eval`, deselected from the default pytest run. - Add `.claude/rules/backend/pydantic-ai.md` — flags that llms-full.txt is now reference-only (as of the 1.79 refresh) and guides are at the web docs site, so future Claude runs don't grep for walkthroughs that have been moved out of the local snapshot. Test plan: `pytest` → 59 passed, 1 deselected. Pre-commit (ruff, mypy, format) clean. Eval suite runs with `pytest -m eval` and requires a real OPENAI_API_KEY — intentionally kept out of CI default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- hooks.py: switch to the public Hooks(tool_execute_error=...) constructor form; drops the decorator indirection for a single hook and removes the private `_get()` access from tests. - test_brain_agent_hooks.py: replace _FakeCall/_FakeToolDef stubs with real ToolCallPart / ToolDefinition instances; tests now break if pydantic-ai renames those fields instead of silently drifting. - tests/conftest.py: move the dummy OPENAI_API_KEY setup to the session- level root conftest so it runs once before any test imports brain_agent. Eval conftest no longer needs noqa: E402 gymnastics. - tests/evals/conftest.py: add fake_spotify_client() and fake_eeg_model() helpers using MagicMock(spec=...) so intent is explicit and type-checked. - test_prepare_tools.py: use the new MagicMock(spec=...) helpers; stops hinting that the capabilities "check truthiness" (they check `is None`). - test_brain_agent_evals.py: BrainAgentInput now carries with_spotify so build_relaxation_playlist_needs_confirmation can run with a fake Spotify client — previously it was passing for the wrong reason (agent refusing because Spotify was disconnected, not because it was awaiting confirm). - brain_agent.py: drop the explicit OpenAIResponsesModelSettings | None annotation; mypy infers it. - DEVELOPMENT.md: document the -m eval two-tier test workflow under Tests. pytest → 57 passed, 1 deselected. pre-commit clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The original cut was ~63 lines and mostly restated facts already in CLAUDE.md or trivially grep-able from the code (capability class names, sdk_version=6, logfire wiring, SUMMARIZABLE_TOOLS location). Those bullets invite staleness — rename a capability, bump sdk_version, or add a fifth capability and the rule rots silently. Keep only the two things that aren't derivable from grepping: 1. The split between llms-full.txt (reference) and ai.pydantic.dev/ (guides) — the one fact that saves future sessions from fruitlessly grepping for walkthroughs that have been moved upstream. 2. The policy that tool error handling should flow through the hook rather than in-tool try/except — a behavioral rule, not a code fact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LukeMainwaring and others added 4 commits April 10, 2026 07:23

chore: bump all dependencies to latest versions

a82dd54

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LukeMainwaring merged commit 0d062fa into main Apr 10, 2026
4 checks passed

LukeMainwaring mentioned this pull request Apr 10, 2026

chore: bump deps + adopt pydantic-ai 1.79 patterns LukeMainwaring/samplespace#41

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump deps + adopt pydantic-ai 1.79 patterns#13

chore: bump deps + adopt pydantic-ai 1.79 patterns#13
LukeMainwaring merged 4 commits into
mainfrom
update-deps/2026-04-10

LukeMainwaring commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LukeMainwaring commented Apr 10, 2026

Summary

Changes

Breaking Changes

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant