diff --git a/.github/skills/pr-finisher/SKILL.md b/.github/skills/pr-finisher/SKILL.md index f901289728e..7fe109e4a9b 100644 --- a/.github/skills/pr-finisher/SKILL.md +++ b/.github/skills/pr-finisher/SKILL.md @@ -1,38 +1,91 @@ --- name: pr-finisher -description: Prepare an open pull request for merge by fixing local validation and failing checks. +description: Prepare an open pull request for merge from a GitHub Copilot cloud agent. Drives Reviews, local validation, and Mergeable to a ready state. Does not merge, and cannot trigger CI. --- # PR Finisher -Use this skill when asked to get a pull request to a merge-ready state. +Drive an open PR for the current branch to a merge-ready state from a **GitHub Copilot cloud agent**. **Do not merge.** When everything you can act on is done, report ready-for-human-merge and stop. -## Goal +## Execution context -Take the current pull request from "almost done" to "ready to merge" by running the standard local checks, fixing failures, checking GitHub status, and pushing the resulting changes. +This skill runs inside a GitHub Copilot cloud agent, not on a developer's machine. -## Required Workflow +- **The agent's pushes do not trigger CI.** Workflows on the PR will not re-run after the agent commits. Any existing `statusCheckRollup` reflects an earlier HEAD and goes stale the moment the agent pushes. +- **Local `make` targets are the agent's authoritative correctness signal** before push. CI is observational only. +- **Re-running CI is a hand-off to a human** (close/reopen the PR, `workflow_dispatch`, or a push from a maintainer). The agent must surface this in its summary. +- **No watch / no sleep loops.** The agent has no async wait state; one pass + summary + stop. -1. Confirm the pull request context for the current branch. -2. Run `make fmt`. -3. Run the `copilot-review` skill to collect and address all in-scope pull request review comments. -4. Run `make lint` and fix any lint failures. -5. Run `make test-unit` and fix any failing unit tests. -6. Check the current GitHub checks for the pull request and identify every failing check. -7. If `make test` is failing locally or the failing checks show the equivalent failure in CI, fix those failures too. -8. If wasm golden tests fail, or a test fix changes the expected wasm compiler output, run `make update-wasm-golden`. -9. Re-run the affected local validation until it passes. -10. Push the final changes. +## Three merge-ready conditions -## Check Inspection +A PR is merge-ready when **all three** are satisfied. Work them **concurrently**. -- Use GitHub tools to inspect the current pull request status and check runs. -- When the user mentions CI, build, test, or workflow failures, fetch the failing job logs before deciding on a fix. -- Distinguish between failures already covered by local commands (`fmt`, `lint`, `test-unit`, `test`) and unrelated external failures. +| Condition | Definition | Agent's signal | +|---|---|---| +| **Reviews** | Every unresolved review thread is addressed on its merits, replied to, and resolved. Code changes alone do not satisfy this. | `copilot-review` skill + GraphQL `reviewThreads` | +| **Checks** | Local `make fmt` / `make lint` / `make test-unit` / `make test` pass. Last-known CI runs reviewed at log level. | `make` targets locally; `gh pr checks` / `gh run view --log-failed` for prior runs | +| **Mergeable** | PR is OPEN, not draft, `mergeable: MERGEABLE`, not `BEHIND` if the repo requires up-to-date branches. | `gh pr view --json mergeable,mergeStateStatus,state,isDraft` | -## Command Order +Because the agent cannot re-trigger CI, "Checks" is satisfied at the agent's level when **local validation passes and prior CI failures have been root-caused and fixed in the pushed commits**. Final green CI requires a human to re-trigger after the agent stops. -Run commands in this order unless a failure requires an earlier retry: +Top-level PR comments and review bodies are useful feedback but **not** a merge gate. Read and action useful ones; do not block on them. + +## Hard rules + +- **Do not merge.** Never run `gh pr merge`, enable auto-merge, or enqueue. This skill stops at "ready for merge." +- **Do not post stand-alone PR comments.** Only reply on existing review threads / comments that need a response. Do not ping reviewers or CODEOWNERS. +- **Always disable pagers** for `gh`: prefix with `GH_PAGER=""` or pipe through `cat`. Without this, commands hang in non-interactive shells. +- **Never wait for CI to re-run.** No `bash sleep`, no `gh run watch`, no `gh pr checks --watch`, no re-check loop after push. The agent's pushes will not trigger workflows; waiting is futile. +- **Local validation is non-negotiable before push.** Because CI will not re-run, the only correctness gate the agent gets is `make ...` locally. Treat a green local run as the bar. +- **Reviews are not done until reply + resolve both succeed.** Code change alone ≠ thread handled. +- **Smallest fix that works.** Don't change unrelated code. Fix lint before tests. +- **Pre-existing unrelated failures** → identify explicitly in the summary; do not guess-fix. + +## CI-fix anti-patterns (do not do these) + +A failing CI step is a signal, not a nuisance. Even though the agent cannot re-run CI to confirm, the following are **forbidden** and should trigger `ask_user` instead: + +- Disabling, skipping, or neutering shared tooling (build caches, lint rules, type checks, env vars, required checks) to make a failure go away. +- "Temporary" disables with a TODO to re-enable later. They outlive the PR and become permanent. +- Lowering coverage thresholds, removing assertions, or loosening a test until it passes. If the test is wrong about product behavior, fix its **logic** (assertions, fixtures, setup); don't relax it. +- Bundling a workaround with a real fix ("belt and suspenders"). Ship one real fix or escalate. Never both. +- Special-casing one OS/runner to hide a failure on that platform. + +**Anti-pattern test:** if the change would make the failure invisible on future PRs without solving it, stop and escalate. + +**Before declaring a tool broken on a platform:** reproduce locally, check version/config, look for transient causes (timeouts, network, runner state). Most "X is broken on macOS/Windows" reports are transient flakes on healthy tooling. + +**For flaky infra** (caches, registries, runners): prefer narrow fixes — targeted retry, higher timeout, pre-flight health check. If a narrow fix doesn't land in one or two attempts, escalate via `ask_user`. + +## Workflow + +The agent runs this once. There is no monitoring loop. + +### 1. Triage + +```bash +GH_PAGER="" gh pr view --json state,isDraft,reviewDecision,mergeable,mergeStateStatus,statusCheckRollup,headRefOid +GH_PAGER="" gh pr checks +``` + +If merged/closed, report and stop. Otherwise classify each condition as ✅ / ❌ / ⏳ / ❓. The CI snapshot here is your **only** view of CI for this run — capture which checks failed and why before changing anything, because after you push it will be stale. + +### 2. Address Reviews + +Delegate to the `copilot-review` skill. For each unresolved thread: make change → commit → reply → resolve. A thread is not handled until reply + resolve both succeed. Push only after all in-scope threads are handled (batch with other fixes below). + +### 3. Address Mergeable + +```bash +GH_PAGER="" gh pr view --json mergeable,mergeStateStatus +``` + +- `CONFLICTING` → resolve conflicts using the repo's conventions. If you cannot determine the correct resolution, `ask_user`. +- `mergeStateStatus: BEHIND` → update branch from base. After updating, scan the new commits for tooling drift (lockfiles, toolchains, lint configs); re-run installs if manifests changed, and flag drift in the summary so any new errors read as drift, not regressions. + +### 4. Address Checks (local + prior CI) + +**Local validation** — the agent's only correctness signal. Run in order; fix at each step before moving on: ```bash make fmt @@ -41,34 +94,64 @@ make test-unit make test ``` -Only run `make test` after `make test-unit` and lint are clean, or when failing PR checks indicate the broader test suite is still broken. - -## Fix Scope +If a `make test` fix changes wasm compiler output, or wasm golden tests fail: -- Make the smallest changes needed to get the pull request green. -- Fix lint issues before test issues. -- Do not change unrelated code just because it is nearby. -- If a failure is pre-existing and unrelated to the pull request, report that explicitly in your final user or PR update instead of guessing. +```bash +make update-wasm-golden +``` -## Wasm Golden Files +Then re-run the affected tests. -When a `make test` fix changes wasm compiler output or wasm golden tests fail, regenerate the wasm goldens with: +**Prior CI failures** — for each failure captured during triage, pull logs and fix the root cause: ```bash -make update-wasm-golden +GH_PAGER="" gh run view --log-failed +``` + +Classify as: real product/test bug, infra flake, or third-party flake. Apply the fix in the agent's commits and, where possible, **reproduce the fix locally** via the matching `make` target. If the failure can't be reproduced locally (infra-only), state that in the summary so the human re-triggers CI with eyes open. Per anti-pattern rules: 1–2 narrow attempts, then `ask_user`. + +### 5. Push and stop + +Push once, after all fixes are in. **Do not re-check `gh pr checks` expecting a new run.** Print the summary and stop. + +## Summary format + +At the stopping point, print: + +``` +- ✅ Reviews — +- ✅ Checks (local) — +- Checks (CI) — stale after agent push; needs human re-trigger. Prior failures: +- ✅ Mergeable — + +Actions taken: +Hand-off: CI must be re-triggered by a maintainer (close/reopen PR, workflow_dispatch, or push) before merge. +Still needed: ``` -Then re-run the relevant tests. +Status vocabulary: +- ✅ satisfied — checked and passing +- ❌ failing — checked and failing +- ⏳ pending — running, waiting for signal (rare for the agent; never use for the post-push CI state) +- ❓ unknown — could not be checked (API error, indeterminate, or CI stale after agent push). Never use ❌ for this. + +**Translate status into plain language.** Don't write bare labels. Always state explicitly that CI on the agent's HEAD is unverified until a human re-triggers it. + +## Stopping conditions + +- **Ready for merge (pending human CI re-trigger)** — local validation green, Reviews resolved, Mergeable clean. Summarize and stop. +- **Nothing actionable remains** — non-actionable blocker (human approval, external service). Summarize and stop. +- **Truly stuck** — unresolvable conflicts, ambiguous feedback, irreproducible failures. `ask_user` with context. -## Completion Standard +## Completion standard -The task is complete only when all of the following are true: +The task is complete only when all are true: -- `make fmt` has been run. -- The `copilot-review` skill was run and all in-scope review comments were addressed. -- `make lint` passes, or any unrelated pre-existing failure is explicitly identified. -- `make test-unit` passes, or any unrelated pre-existing failure is explicitly identified. -- Failing pull request checks were inspected and summarized. -- `make test` was fixed when it was part of the failing state. -- Wasm goldens were regenerated when required. -- The final changes were pushed. +- `make fmt`, `make lint`, `make test-unit` all pass (or unrelated pre-existing failures explicitly identified). +- `make test` was run and fixed when it was part of the failing state; wasm goldens regenerated when required. +- The `copilot-review` skill addressed all in-scope review threads (reply + resolve succeeded for each). +- Mergeable condition was checked; conflicts resolved and `BEHIND` updated when present. +- Prior CI failures were inspected at the log level and either fixed at the root cause (with a local reproduction where possible) or explicitly flagged as not locally reproducible / escalated. +- Final changes were pushed exactly once at the end. No post-push re-check loop. +- A structured ✅/❌/⏳/❓ summary was printed, including an explicit hand-off line for the human CI re-trigger. +- No `gh pr merge` was run.