[Maintainer] Upgrade to v0.13.0

## [Maintainer] Upgrade to v0.13.0

Current version: `0.12.0`
Target version: `0.13.0`

⚠️ **This is a breaking release.** Manual review is recommended.

### Upgrade Notes
## Highlights

Phase 1 of the **2026-Q2 agentic migration plan** (see `docs/plans/`) lands. No breaking changes; everything is additive or opt-in. Six PRs merged in one block.

### Added

- **Anthropic prompt caching** (#479) across both providers. `AnthropicProvider.complete` and `LiteLLMProvider.complete` / `.complete_with_tools` annotate the system block with `cache_control: {type: "ephemeral"}`; a `_supports_prompt_cache(model)` substring gate fails open for non-Claude models. Two new Prometheus counters, `caretaker_llm_cache_read_tokens_total` and `caretaker_llm_cache_creation_tokens_total`, labelled `{provider, model}`, let Grafana compute the hit ratio directly. A follow-up R2 spike is queued to evaluate a second cache breakpoint and to measure TTL vs long-running tool loops.

- **`ClaudeClient.structured_complete[T: BaseModel]`** (#477) — the helper that every Phase 2 decision migration is built on. Prepends `schema.model_json_schema()` to the system prompt, calls the provider, parses JSON, validates with pydantic, and on failure re-asks once with the validation error appended before raising `StructuredCompleteError(raw_text, validation_error)`. `LLMConfig.structured_output_retries` tunes the retry budget (default 1). `pr_reviewer/inline_reviewer.py` migrated as the canary; its silent `verdict=COMMENT` downgrade on parse failure is gone.

- **403 scope-gap surfacing** (#480). `ScopeGapTracker` (thread-safe, per-run) captures every 403 from a workflow token that lacks a scope — messages matching "Resource not accessible by integration", the PAT equivalent, or "Must have admin rights". Deduped on `(endpoint_template, http_method)`, endpoint numeric IDs normalised, and mapped to a scope hint via a curated endpoint-to-scope table. After the run, `publish_scope_gap_issue()` upserts a single `[caretaker] Workflow token is missing required scopes` issue (labels `caretaker:scope-gap`, `maintainer:action-required`) with a concrete `permissions:` YAML snippet the consumer can drop into `.github/workflows/maintainer.yml`. New counter `caretaker_github_scope_gap_total{service, scope}`.

- **Orchestrator transient-error exit gate** (#481, T-M1). Run ends with exit 0 when every collected `RunSummary.errors` entry classifies as transient (403s from dependabot / code-scanning / secret-scanning / assignees / pulls APIs, network timeouts, upstream 5xx, empty artifact) AND at least some real work completed. New counter `caretaker_orchestrator_soft_fail_total{category="transient"}` keeps the signal visible. `.github/workflows/maintainer.yml` `upload-artifact` steps now carry `if-no-files-found: ignore` + `continue-on-error: true` so a missing memory snapshot no longer fails the job. Closes the "Unknown caretaker failure" self-heal storm observed on the fleet.

- **Per-signature self-heal storm cap** (#481, T-M7). The cap key is now `(repo, error_signature, hour_window = floor(created_at / 3600))` with a default budget of 5/hour, 20/day per key. One noisy signature in one repo can no longer burn the global cap for everyone else. Regression test simulates a burst of 10 identical failures in 30 seconds and asserts exactly 5 issues are filed.

- **Flux GitOps for the `bigboy` AKS cluster** (#478). Two Flux `Kustomization` CRs under `k8s/flux/clusters/bigboy/caretaker.yaml` — `caretaker` (app bundle, `targetNamespace: caretaker`) and `caretaker-ingress` (Istio VirtualService, `targetNamespace: default`, `dependsOn: caretaker`). Resources stay in `infra/k8s/` and are wrapped by `k8s/apps/caretaker/kustomization.yaml` / `k8s/apps/caretaker-ingress/kustomization.yaml` so hand-apply and GitOps read one source of truth. Mirrors the pattern already running in production for `Example-React-AI-Chat-App`. One-time bootstrap lives outside this PR: `az k8s-configuration flux create --name caretaker ...` against the target cluster.

### Fixed

- **Trailing newline on every consumer-repo file write** (#476). New `caretaker.util.text.ensure_trailing_newline` is applied in both write paths: `foundry/tools._tool_write_file` (the LLM-callable workspace writer) and `github_client/api.GitHubClient.create_or_update_file` (the direct contents-API writer used by `docs_agent`). Closes the Copilot "add EOF newline" fan-out PR chain observed on `python_dsa`, `kubernetes-apply-vscode`, and `flashcards` after every caretaker self-upgrade.

- **Orchestrator-state comment upsert dedupe** (#481, T-M8). `GitHubClient.upsert_issue_comment` now collects every marker-matching comment, sorts newest-first, edits the newest in place, and best-effort deletes older duplicates beyond `max_duplicates_to_retain` (default 2). Closes the 146-comment ballooning observed on `python_dsa` #23. Delete failures are logged, never raised — the upsert's hot path is unaffected.

## Compatibility

- No breaking changes. `min_compatible`: `0.10.0`.
- Prompt caching is enabled automatically on Claude models and no-ops on everything else; no config flag required.
- `structured_complete[T]` is additive; the raw `ClaudeClient.complete` path is unchanged.
- Flux GitOps is opt-in at the cluster level — the repo changes are inert until you run `az k8s-configuration flux create`.

## Test plan

Each constituent PR landed full gates (pytest / ruff / mypy) green before merge. Fleet-side smoke tests deferred to post-merge in production on the five `caretaker`-topic consumer repos; watch for the disappearance of "Unknown caretaker failure" issues and the EOF-newline fan-out chain.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

📋 [Full Changelog](https://github.com/ianlintner/caretaker/releases/tag/v0.13.0)

@copilot Please apply this upgrade.
See `.github/agents/maintainer-upgrade.md` for instructions.


FROM: 0.12.0
TO: 0.13.0
BREAKING: True


**Steps:**
1. Update version pins in `pyproject.toml` / `requirements.txt`
2. Update any workflow references
3. Run tests to verify compatibility
4. Update the version in config if applicable

**Acceptance criteria:**
- [ ] Version updated to target
- [ ] All tests pass
- [ ] No regressions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Maintainer] Upgrade to v0.13.0 #48