Skip build-failure-analysis agent on green builds by Evangelink · Pull Request #8738 · microsoft/testfx

Evangelink · 2026-06-01T14:59:44Z

Summary

The Build Failure Analysis workflow keeps producing red ❌ checks on PRs with perfectly green builds. Example: https://github.com/microsoft/testfx/actions/runs/26760910013.

PR #8726 (merged earlier today) only suppressed the auto-tracked tracking issue; the failed check on the PR itself was unaffected.

Root cause (confirmed from run logs)

The workflow uses gh-aw — there is no agent-job-level if: hook, so the agent runs on every PR.
On a green build, the agent's instructions say to call the noop safe-output and stop.
The Copilot CLI hits a transient mid-conversation flake (Response was interrupted due to a server error).
gh-aw's copilot_harness.cjs hard-codes MAX_RETRIES = 3 and retries with --continue — each retry re-issues the terminating noop call.
Cumulative count: 4 noops. The default cap is 1, so noops 2–4 are rejected, the agent step exits 1, and the PR shows a red X.

Fix

Split the workflow so the actual build runs in a dedicated build job whose outputs gate the AI agent pipeline:

New jobs.build: runs ./build.sh --binaryLog, locates the binlog, installs Microsoft.AITools.BinlogMcp, dumps JSON to /tmp/binlog-data/, and uploads build-failure-analysis-data as an artifact (only on failure). Outputs: outcome, binlog-found, binlog-relative-path. Build step uses continue-on-error so the job itself always succeeds, and exit + "" + \ + "" + prevents a tee glitch from misclassifying a green build as failed.
on.needs: [build] makes pre_activation / activation wait for build.
Top-level if: needs.build.outputs.outcome == 'failure' gates activation — and the agent cascades to skipped — so the AI pipeline never runs on a green build.
The agent job downloads the artifact instead of rebuilding, and only installs NuGet.Mcp.Server (which it uses at runtime for NU#### errors).

For the residual case where the agent still runs (failed builds), I raised the safe-output caps to absorb the harness retry budget:

noop.max: 5 (1 happy-path + 4 retry-amplified)
add-comment.max: 5 (hide-older-comments: true collapses duplicates)
create-pull-request-review-comment.max: 25 (shared body asks for top-5 issues, so 5 × 4 retries = 20)

Slash-command variant

Applied the same split to build-failure-analysis-command.md, plus an explicit refs/pull/<n>/merge checkout in its build job — without it, pull_request_comment events check out the default branch instead of the PR a maintainer ran /analyze-build-failure on.

Verification

Compiled both workflows with gh aw compile --strict (gh-aw v0.75.4): 0 errors, 0 warnings.
Inspected the generated .lock.yml files to confirm:
- pre_activation, activation, and agent all gate on needs.build.outputs.outcome == 'failure'.
- Build job emits the expected outputs and uploads the artifact only on failure.
- Agent job downloads the artifact and exports GH_AW_BUILD_OUTCOME / GH_AW_BINLOG_PATH correctly.
- conclusion: job's if: is not affected (only runs when agent did not skip or activation lockdown failed).

Risk

Low. The change is structural — the agent prompt body (shared/build-failure-analysis-shared.md) is unchanged. The build job runs ./build.sh --binaryLog identically to before (same script, same flags) and the AI agent receives identical inputs (same JSON dumps, same env vars). Worst case the new gating is wrong and the agent doesn't run on a failed build — but that just makes the workflow silently no-op, never a regression vs the current red-X state.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The Build Failure Analysis workflow runs an AI agent on every PR. On a successful build, the agent's instructions say to call `noop` and stop. But when the Copilot CLI hits a transient mid-conversation flake, gh-aw's copilot_harness retries up to 3 times with `--continue`, and each retry re-emits the terminating `noop` -- producing 4 cumulative noops. The default `noop.max: 1` then rejects noops 2-4, the agent step exits 1, and the PR shows a red X on a perfectly green build. PR microsoft#8726 only suppressed the auto-tracked issue, not the failed check itself. This change splits the workflow so the build runs in a dedicated `build` job whose outputs gate the AI pipeline: * Custom `jobs.build:` runs `./build.sh --binaryLog`, locates the binlog, installs Microsoft.AITools.BinlogMcp, dumps JSON to `/tmp/binlog-data/`, and uploads `build-failure-analysis-data` as an artifact (only on failure). Outputs: `outcome`, `binlog-found`, `binlog-relative-path`. Build step uses `continue-on-error` so the job itself always succeeds, and `exit ${PIPESTATUS[0]}` ensures a `tee` glitch can't misclassify a green build as failed. * `on.needs: [build]` makes pre_activation/activation wait for `build`. * Top-level `if: needs.build.outputs.outcome == 'failure'` gates activation (cascading to the agent), so the AI pipeline never runs on a green build. * The agent job downloads the artifact instead of rebuilding, and installs only NuGet.Mcp.Server (which it uses at runtime for NU#### errors). For the residual case where the agent still runs (failed builds), the safe-output caps are raised to absorb the harness retry budget: `noop.max: 5`, `add-comment.max: 5`, `create-pull-request-review-comment.max: 25`. With `hide-older-comments: true`, duplicate add-comments collapse anyway. The slash-command variant (`/analyze-build-failure`) gets the same split, plus an explicit `refs/pull/<n>/merge` checkout in its build job since `pull_request_comment` events otherwise check out the default branch. Fixes the failing runs at https://github.com/microsoft/testfx/actions/workflows/build-failure-analysis.lock.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR restructures the gh-aw “Build Failure Analysis” workflows so the Copilot agent pipeline is skipped on successful (green) builds, preventing transient AI flakes from surfacing as red ❌ checks on otherwise healthy PRs.

Changes:

Split both workflows into a dedicated build job (produces binlog + JSON dump and uploads an artifact on failure) and an agent pipeline gated on needs.build.outputs.outcome == 'failure'.
Updated the agent job to download the build artifact instead of rebuilding, and adjusted exported GH_AW_* context accordingly.
Increased safe-output caps to tolerate Copilot CLI harness retries without safe-output validation failures.

Show a summary per file

File	Description
.github/workflows/build-failure-analysis.md	Adds a separate `build` job and gates activation/agent execution on build failure; switches agent setup to artifact download.
.github/workflows/build-failure-analysis.lock.yml	Regenerated lock workflow reflecting new job graph, gating, and safe-output limits.
.github/workflows/build-failure-analysis-command.md	Mirrors the same build/agent split for the slash-command workflow, including explicit PR merge-ref checkout.
.github/workflows/build-failure-analysis-command.lock.yml	Regenerated lock workflow reflecting the command variant’s new job graph and gating.
.github/aw/actions-lock.json	Updates action pin entries used by gh-aw compilation (notably `actions/setup-dotnet@v5`).

Copilot's findings

Files reviewed: 5/5 changed files
Comments generated: 4

… against fork PRs * Add github/gh-aw-actions/setup@v0.75.4 SHA pin (9f050961da586148d135e113d8bb025185cdf2b8) to .github/aw/actions-lock.json. Without this entry the gh-aw compiler had no immutable reference and fell back to emitting 'uses: github/gh-aw-actions/setup@v0.75.4' with the tag recorded as the manifest 'sha', defeating strict-mode action pinning. * Add a fork guard on the new 'build' job in build-failure-analysis.md mirroring the workflow's 'forks: []' trigger filter. The gh-aw frontmatter already gates pre_activation/activation/agent on the same expression, but user-defined 'jobs:' entries are emitted verbatim, so without an explicit 'if:' the build job would still run on fork PRs (paying CI time and exposing dotnet-tools auth-gated installs to forks) even though the agent pipeline never executes for them. * Recompile both build-failure-analysis*.lock.yml with gh aw 0.75.4 --strict so 'github/gh-aw-actions/setup' is pinned to its commit SHA everywhere and the build job's new fork guard is reflected in the generated YAML. The slash-command variant intentionally has no fork guard: '/analyze-build-failure' is restricted to roles [admin, maintainer, write] and explicitly checks out 'refs/pull/<n>/merge', so a maintainer can rerun the analysis on a fork PR on demand. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 1, 2026 14:59

Copilot started reviewing on behalf of Evangelink June 1, 2026 14:59 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread .github/aw/actions-lock.json

Comment thread .github/workflows/build-failure-analysis.lock.yml Outdated

Comment thread .github/workflows/build-failure-analysis-command.lock.yml Outdated

Comment thread .github/workflows/build-failure-analysis.md

Evangelink enabled auto-merge (squash) June 2, 2026 08:09

YuliiaKovalova approved these changes Jun 2, 2026

View reviewed changes

Evangelink merged commit 6c3d766 into microsoft:main Jun 2, 2026
17 checks passed

Evangelink deleted the dev/amauryleve/build-failure-analysis-skip-on-success branch June 2, 2026 08:32

Evangelink mentioned this pull request Jun 11, 2026

Feature: agent-job-level if:/ eeds: hook to gate the agent on a custom setup job github/gh-aw#38564

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skip build-failure-analysis agent on green builds#8738

Skip build-failure-analysis agent on green builds#8738
Evangelink merged 2 commits into
microsoft:mainfrom
Evangelink:dev/amauryleve/build-failure-analysis-skip-on-success

Evangelink commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Evangelink commented Jun 1, 2026

Summary

Root cause (confirmed from run logs)

Fix

Slash-command variant

Verification

Risk

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants