Skip build-failure-analysis agent on green builds#8738
Merged
Evangelink merged 2 commits intoJun 2, 2026
Merged
Conversation
The Build Failure Analysis workflow runs an AI agent on every PR. On a successful build, the agent's instructions say to call `noop` and stop. But when the Copilot CLI hits a transient mid-conversation flake, gh-aw's copilot_harness retries up to 3 times with `--continue`, and each retry re-emits the terminating `noop` -- producing 4 cumulative noops. The default `noop.max: 1` then rejects noops 2-4, the agent step exits 1, and the PR shows a red X on a perfectly green build. PR microsoft#8726 only suppressed the auto-tracked issue, not the failed check itself. This change splits the workflow so the build runs in a dedicated `build` job whose outputs gate the AI pipeline: * Custom `jobs.build:` runs `./build.sh --binaryLog`, locates the binlog, installs Microsoft.AITools.BinlogMcp, dumps JSON to `/tmp/binlog-data/`, and uploads `build-failure-analysis-data` as an artifact (only on failure). Outputs: `outcome`, `binlog-found`, `binlog-relative-path`. Build step uses `continue-on-error` so the job itself always succeeds, and `exit ${PIPESTATUS[0]}` ensures a `tee` glitch can't misclassify a green build as failed. * `on.needs: [build]` makes pre_activation/activation wait for `build`. * Top-level `if: needs.build.outputs.outcome == 'failure'` gates activation (cascading to the agent), so the AI pipeline never runs on a green build. * The agent job downloads the artifact instead of rebuilding, and installs only NuGet.Mcp.Server (which it uses at runtime for NU#### errors). For the residual case where the agent still runs (failed builds), the safe-output caps are raised to absorb the harness retry budget: `noop.max: 5`, `add-comment.max: 5`, `create-pull-request-review-comment.max: 25`. With `hide-older-comments: true`, duplicate add-comments collapse anyway. The slash-command variant (`/analyze-build-failure`) gets the same split, plus an explicit `refs/pull/<n>/merge` checkout in its build job since `pull_request_comment` events otherwise check out the default branch. Fixes the failing runs at https://github.com/microsoft/testfx/actions/workflows/build-failure-analysis.lock.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR restructures the gh-aw “Build Failure Analysis” workflows so the Copilot agent pipeline is skipped on successful (green) builds, preventing transient AI flakes from surfacing as red ❌ checks on otherwise healthy PRs.
Changes:
- Split both workflows into a dedicated
buildjob (produces binlog + JSON dump and uploads an artifact on failure) and an agent pipeline gated onneeds.build.outputs.outcome == 'failure'. - Updated the agent job to download the build artifact instead of rebuilding, and adjusted exported
GH_AW_*context accordingly. - Increased safe-output caps to tolerate Copilot CLI harness retries without safe-output validation failures.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/build-failure-analysis.md | Adds a separate build job and gates activation/agent execution on build failure; switches agent setup to artifact download. |
| .github/workflows/build-failure-analysis.lock.yml | Regenerated lock workflow reflecting new job graph, gating, and safe-output limits. |
| .github/workflows/build-failure-analysis-command.md | Mirrors the same build/agent split for the slash-command workflow, including explicit PR merge-ref checkout. |
| .github/workflows/build-failure-analysis-command.lock.yml | Regenerated lock workflow reflecting the command variant’s new job graph and gating. |
| .github/aw/actions-lock.json | Updates action pin entries used by gh-aw compilation (notably actions/setup-dotnet@v5). |
Copilot's findings
- Files reviewed: 5/5 changed files
- Comments generated: 4
… against fork PRs * Add github/gh-aw-actions/setup@v0.75.4 SHA pin (9f050961da586148d135e113d8bb025185cdf2b8) to .github/aw/actions-lock.json. Without this entry the gh-aw compiler had no immutable reference and fell back to emitting 'uses: github/gh-aw-actions/setup@v0.75.4' with the tag recorded as the manifest 'sha', defeating strict-mode action pinning. * Add a fork guard on the new 'build' job in build-failure-analysis.md mirroring the workflow's 'forks: []' trigger filter. The gh-aw frontmatter already gates pre_activation/activation/agent on the same expression, but user-defined 'jobs:' entries are emitted verbatim, so without an explicit 'if:' the build job would still run on fork PRs (paying CI time and exposing dotnet-tools auth-gated installs to forks) even though the agent pipeline never executes for them. * Recompile both build-failure-analysis*.lock.yml with gh aw 0.75.4 --strict so 'github/gh-aw-actions/setup' is pinned to its commit SHA everywhere and the build job's new fork guard is reflected in the generated YAML. The slash-command variant intentionally has no fork guard: '/analyze-build-failure' is restricted to roles [admin, maintainer, write] and explicitly checks out 'refs/pull/<n>/merge', so a maintainer can rerun the analysis on a fork PR on demand. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
YuliiaKovalova
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Build Failure Analysis workflow keeps producing red ❌ checks on PRs with perfectly green builds. Example: https://github.com/microsoft/testfx/actions/runs/26760910013.
PR #8726 (merged earlier today) only suppressed the auto-tracked tracking issue; the failed check on the PR itself was unaffected.
Root cause (confirmed from run logs)
if:hook, so the agent runs on every PR.noopsafe-output and stop.Response was interrupted due to a server error).copilot_harness.cjshard-codesMAX_RETRIES = 3and retries with--continue— each retry re-issues the terminatingnoopcall.Fix
Split the workflow so the actual build runs in a dedicated
buildjob whose outputs gate the AI agent pipeline:jobs.build:runs./build.sh --binaryLog, locates the binlog, installsMicrosoft.AITools.BinlogMcp, dumps JSON to/tmp/binlog-data/, and uploadsbuild-failure-analysis-dataas an artifact (only on failure). Outputs:outcome,binlog-found,binlog-relative-path. Build step usescontinue-on-errorso the job itself always succeeds, andexit + "" + \ + "" +prevents ateeglitch from misclassifying a green build as failed.on.needs: [build]makespre_activation/activationwait forbuild.if: needs.build.outputs.outcome == 'failure'gatesactivation— and the agent cascades to skipped — so the AI pipeline never runs on a green build.NuGet.Mcp.Server(which it uses at runtime forNU####errors).For the residual case where the agent still runs (failed builds), I raised the safe-output caps to absorb the harness retry budget:
noop.max: 5(1 happy-path + 4 retry-amplified)add-comment.max: 5(hide-older-comments: truecollapses duplicates)create-pull-request-review-comment.max: 25(shared body asks for top-5 issues, so 5 × 4 retries = 20)Slash-command variant
Applied the same split to
build-failure-analysis-command.md, plus an explicitrefs/pull/<n>/mergecheckout in its build job — without it,pull_request_commentevents check out the default branch instead of the PR a maintainer ran/analyze-build-failureon.Verification
gh aw compile --strict(gh-aw v0.75.4): 0 errors, 0 warnings..lock.ymlfiles to confirm:pre_activation,activation, andagentall gate onneeds.build.outputs.outcome == 'failure'.GH_AW_BUILD_OUTCOME/GH_AW_BINLOG_PATHcorrectly.conclusion:job'sif:is not affected (only runs when agent did not skip or activation lockdown failed).Risk
Low. The change is structural — the agent prompt body (
shared/build-failure-analysis-shared.md) is unchanged. The build job runs./build.sh --binaryLogidentically to before (same script, same flags) and the AI agent receives identical inputs (same JSON dumps, same env vars). Worst case the new gating is wrong and the agent doesn't run on a failed build — but that just makes the workflow silently no-op, never a regression vs the current red-X state.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com