fix(pr-agent): always ignore caretaker/pr-readiness in CI eval (self-deadlock)#433
Merged
ianlintner merged 1 commit intomainfrom Apr 21, 2026
Merged
Conversation
Surfaced during live custom-agent e2e on #431: caretaker sees the lint failure, ranks the PR as ``ci_pending``, and waits — forever — because its OWN ``caretaker/pr-readiness`` check is also on the check-runs list and it's always either ``pending`` or ``action_required`` while the state machine is still deciding. The state machine then refuses to act until all checks settle, which requires it to act, which… deadlock. Fix in two places: 1. ``src/caretaker/pr_agent/states.py`` — ``evaluate_ci`` now merges the caller's ``ignore_jobs`` with a hard-coded ``_ALWAYS_IGNORED_CHECK_NAMES`` set that includes ``caretaker/pr-readiness``. Every downstream consumer inherits the fix without needing to touch their config. 2. ``.github/maintainer/config.yml`` — explicit ignore entry with a comment so the dogfood config demonstrates the pattern for anyone copying it. Diagnostic trail: - Run 24706775007 on PR #431 (deliberate E501 for the e2e): ``PR #431: ci_pending → ci_pending (action: wait)`` — FoundryExecutor was ready but never dispatched because evaluate_ci returned PENDING on the pr-readiness check. - After fix: evaluate_ci sees only the upstream checks (``lint`` in FAILURE); CIStatus becomes FAILING; PR agent builds a ``CopilotTask(LINT_FAILURE)`` and hands it to the dispatcher; dispatcher finds LINT_FAILURE in the allowlist and routes to Foundry. Full pytest suite still green (907 passed).
ianlintner
added a commit
that referenced
this pull request
Apr 21, 2026
…f custom executor) (#431) * docs(readme): add Fleet registry + custom coding agent sections (e2e test) Also introduces a deliberate ruff E501 violation in ``src/caretaker/fleet/api.py`` (well outside the code-path and marked with a big ``DELIBERATE E2E TEST`` comment) so we can observe the new custom coding agent end-to-end on this PR: 1. CI fails on ``ruff check`` with the expected E501. 2. caretaker's PR agent sees the lint failure and constructs a ``CopilotTask(task_type=LINT_FAILURE)``. 3. ``ExecutorDispatcher.route()`` picks ``provider: auto`` + Foundry eligible + same-repo → dispatches to ``FoundryExecutor.run()``. 4. Foundry fixes the E501 (reformat / wrap / remove the comment), commits, pushes. CI re-runs green. 5. PR reaches merge-ready. If anything in that loop breaks, the fix lands as a follow-up PR — the deliberate violation can always be cleaned up by a human commit if the agent doesn't finish. README additions summarise the real shipped features (fleet registry, custom coding agent, routing labels) so the front page reflects post-sprint state. * fix(workflow): install llm-multi extra so Foundry executor is reachable Discovered during the live e2e test on #431: caretaker's own workflow logs showed executor.foundry.enabled=True but LiteLLM provider is unavailable (missing credentials or package). Routing stays on Copilot. The credentials are present — ``ANTHROPIC_API_KEY`` and ``AZURE_AI_API_KEY`` are both set as repo secrets — but the ``litellm`` package itself is only pulled in by the optional ``llm-multi`` extras group, and the install step ran ``pip install .``. Fixes: - ``.github/workflows/maintainer.yml`` (caretaker dogfood) now installs ``".[llm-multi]"`` so LiteLLM is present in the runner's Python env. Without this, every dispatch cascades through ``LiteLLMProvider.available == False`` and falls back to Copilot, defeating the whole Foundry routing path. - ``setup-templates/templates/workflows/maintainer.yml`` (consumer template) installs ``litellm`` as a second pip step. We can't use the ``[llm-multi]`` extras form in the git-URL install because that spec is name-sensitive and caretaker was renamed from ``caretaker`` to ``caretaker-github`` at v0.8.1; a bare URL + a separate ``pip install litellm`` works across the rename boundary. Once these changes land and the next caretaker run fires, the Foundry executor should actually attempt the LINT_FAILURE fix on #431 instead of routing straight to Copilot. * revert(fleet): remove deliberate E501 — e2e experiment concluded, custom-agent wiring verified via #432/#433/#434
3 tasks
3 tasks
3 tasks
4 tasks
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Live e2e on #431 surfaced a self-gating deadlock: caretaker's own
caretaker/pr-readinesscheck is posted on every PR and is always pending/action_required while the state machine is still evaluating.evaluate_cithen kept returningPENDING, so the PR agent decided to wait — forever — and never handed the lint-failure task to the custom executor.Adds
_ALWAYS_IGNORED_CHECK_NAMES = {'caretaker/pr-readiness'}toevaluate_ciso every consumer inherits the fix without needing to touchci.ignore_jobs. Also added the explicit entry to the caretaker dogfood config for documentation.Run that proved the bug: 24706775007 — logs show
PR #431: ci_pending → ci_pending (action: wait)despite thelintcheck being in FAILURE.Test plan
uv run pytest— 907 passed, 0 regressions.ci_failingand dispatches Foundry.🤖 Generated with Claude Code
EOF
)