fix(pr-agent): always ignore caretaker/pr-readiness in CI eval (self-deadlock) by ianlintner · Pull Request #433 · ianlintner/caretaker

ianlintner · 2026-04-21T06:09:43Z

Live e2e on #431 surfaced a self-gating deadlock: caretaker's own caretaker/pr-readiness check is posted on every PR and is always pending/action_required while the state machine is still evaluating. evaluate_ci then kept returning PENDING, so the PR agent decided to wait — forever — and never handed the lint-failure task to the custom executor.

Adds _ALWAYS_IGNORED_CHECK_NAMES = {'caretaker/pr-readiness'} to evaluate_ci so every consumer inherits the fix without needing to touch ci.ignore_jobs. Also added the explicit entry to the caretaker dogfood config for documentation.

Run that proved the bug: 24706775007 — logs show PR #431: ci_pending → ci_pending (action: wait) despite the lint check being in FAILURE.

Test plan

uv run pytest — 907 passed, 0 regressions.
Next caretaker run on docs(readme): fleet-registry + custom-agent sections (live e2e test of custom executor) #431 transitions to ci_failing and dispatches Foundry.

🤖 Generated with Claude Code
EOF
)

Surfaced during live custom-agent e2e on #431: caretaker sees the lint failure, ranks the PR as ``ci_pending``, and waits — forever — because its OWN ``caretaker/pr-readiness`` check is also on the check-runs list and it's always either ``pending`` or ``action_required`` while the state machine is still deciding. The state machine then refuses to act until all checks settle, which requires it to act, which… deadlock. Fix in two places: 1. ``src/caretaker/pr_agent/states.py`` — ``evaluate_ci`` now merges the caller's ``ignore_jobs`` with a hard-coded ``_ALWAYS_IGNORED_CHECK_NAMES`` set that includes ``caretaker/pr-readiness``. Every downstream consumer inherits the fix without needing to touch their config. 2. ``.github/maintainer/config.yml`` — explicit ignore entry with a comment so the dogfood config demonstrates the pattern for anyone copying it. Diagnostic trail: - Run 24706775007 on PR #431 (deliberate E501 for the e2e): ``PR #431: ci_pending → ci_pending (action: wait)`` — FoundryExecutor was ready but never dispatched because evaluate_ci returned PENDING on the pr-readiness check. - After fix: evaluate_ci sees only the upstream checks (``lint`` in FAILURE); CIStatus becomes FAILING; PR agent builds a ``CopilotTask(LINT_FAILURE)`` and hands it to the dispatcher; dispatcher finds LINT_FAILURE in the allowlist and routes to Foundry. Full pytest suite still green (907 passed).

…tom-agent wiring verified via #432/#433/#434

…f custom executor) (#431) * docs(readme): add Fleet registry + custom coding agent sections (e2e test) Also introduces a deliberate ruff E501 violation in ``src/caretaker/fleet/api.py`` (well outside the code-path and marked with a big ``DELIBERATE E2E TEST`` comment) so we can observe the new custom coding agent end-to-end on this PR: 1. CI fails on ``ruff check`` with the expected E501. 2. caretaker's PR agent sees the lint failure and constructs a ``CopilotTask(task_type=LINT_FAILURE)``. 3. ``ExecutorDispatcher.route()`` picks ``provider: auto`` + Foundry eligible + same-repo → dispatches to ``FoundryExecutor.run()``. 4. Foundry fixes the E501 (reformat / wrap / remove the comment), commits, pushes. CI re-runs green. 5. PR reaches merge-ready. If anything in that loop breaks, the fix lands as a follow-up PR — the deliberate violation can always be cleaned up by a human commit if the agent doesn't finish. README additions summarise the real shipped features (fleet registry, custom coding agent, routing labels) so the front page reflects post-sprint state. * fix(workflow): install llm-multi extra so Foundry executor is reachable Discovered during the live e2e test on #431: caretaker's own workflow logs showed executor.foundry.enabled=True but LiteLLM provider is unavailable (missing credentials or package). Routing stays on Copilot. The credentials are present — ``ANTHROPIC_API_KEY`` and ``AZURE_AI_API_KEY`` are both set as repo secrets — but the ``litellm`` package itself is only pulled in by the optional ``llm-multi`` extras group, and the install step ran ``pip install .``. Fixes: - ``.github/workflows/maintainer.yml`` (caretaker dogfood) now installs ``".[llm-multi]"`` so LiteLLM is present in the runner's Python env. Without this, every dispatch cascades through ``LiteLLMProvider.available == False`` and falls back to Copilot, defeating the whole Foundry routing path. - ``setup-templates/templates/workflows/maintainer.yml`` (consumer template) installs ``litellm`` as a second pip step. We can't use the ``[llm-multi]`` extras form in the git-URL install because that spec is name-sensitive and caretaker was renamed from ``caretaker`` to ``caretaker-github`` at v0.8.1; a bare URL + a separate ``pip install litellm`` works across the rename boundary. Once these changes land and the next caretaker run fires, the Foundry executor should actually attempt the LINT_FAILURE fix on #431 instead of routing straight to Copilot. * revert(fleet): remove deliberate E501 — e2e experiment concluded, custom-agent wiring verified via #432/#433/#434

ianlintner merged commit dccba36 into main Apr 21, 2026
9 of 11 checks passed

ianlintner deleted the fix/ignore-self-readiness-check branch April 21, 2026 06:10

ianlintner added a commit that referenced this pull request Apr 21, 2026

revert(fleet): remove deliberate E501 — e2e experiment concluded, cus…

e3a8a00

…tom-agent wiring verified via #432/#433/#434

the-care-taker Bot mentioned this pull request Apr 21, 2026

docs: reconcile CHANGELOG — 2026-W17 #435

Merged

github-actions Bot mentioned this pull request Apr 21, 2026

[Maintainer] Upgrade to v0.10.1 ianlintner/rust-oauth2-server#202

Closed

3 tasks

Copilot AI mentioned this pull request Apr 21, 2026

chore: upgrade caretaker from v0.9.0 to v0.10.1 ianlintner/rust-oauth2-server#203

Merged

github-actions Bot mentioned this pull request Apr 21, 2026

[Maintainer] Upgrade to v0.10.1 ianlintner/portfolio#167

Closed

3 tasks

Copilot AI mentioned this pull request Apr 21, 2026

chore(maintainer): upgrade caretaker to v0.10.1 ianlintner/portfolio#169

Closed

github-actions Bot mentioned this pull request Apr 21, 2026

[Maintainer] Upgrade to v0.10.1 ianlintner/python_dsa#44

Closed

3 tasks

Copilot AI mentioned this pull request Apr 21, 2026

chore: upgrade caretaker from v0.10.0 to v0.10.1 ianlintner/python_dsa#45

Closed

4 tasks

github-actions Bot mentioned this pull request Apr 21, 2026

[Maintainer] Upgrade to v0.10.1 ianlintner/kubernetes-apply-vscode#14

Closed

3 tasks

Copilot AI mentioned this pull request Apr 21, 2026

chore: upgrade caretaker to v0.10.1 ianlintner/kubernetes-apply-vscode#16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pr-agent): always ignore caretaker/pr-readiness in CI eval (self-deadlock)#433

fix(pr-agent): always ignore caretaker/pr-readiness in CI eval (self-deadlock)#433
ianlintner merged 1 commit intomainfrom
fix/ignore-self-readiness-check

ianlintner commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ianlintner commented Apr 21, 2026

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant