Skip to content

Skip build-failure-analysis agent on green builds#8738

Merged
Evangelink merged 2 commits into
microsoft:mainfrom
Evangelink:dev/amauryleve/build-failure-analysis-skip-on-success
Jun 2, 2026
Merged

Skip build-failure-analysis agent on green builds#8738
Evangelink merged 2 commits into
microsoft:mainfrom
Evangelink:dev/amauryleve/build-failure-analysis-skip-on-success

Conversation

@Evangelink

Copy link
Copy Markdown
Member

Summary

The Build Failure Analysis workflow keeps producing red ❌ checks on PRs with perfectly green builds. Example: https://github.com/microsoft/testfx/actions/runs/26760910013.

PR #8726 (merged earlier today) only suppressed the auto-tracked tracking issue; the failed check on the PR itself was unaffected.

Root cause (confirmed from run logs)

  1. The workflow uses gh-aw — there is no agent-job-level if: hook, so the agent runs on every PR.
  2. On a green build, the agent's instructions say to call the noop safe-output and stop.
  3. The Copilot CLI hits a transient mid-conversation flake (Response was interrupted due to a server error).
  4. gh-aw's copilot_harness.cjs hard-codes MAX_RETRIES = 3 and retries with --continue — each retry re-issues the terminating noop call.
  5. Cumulative count: 4 noops. The default cap is 1, so noops 2–4 are rejected, the agent step exits 1, and the PR shows a red X.

Fix

Split the workflow so the actual build runs in a dedicated build job whose outputs gate the AI agent pipeline:

  • New jobs.build: runs ./build.sh --binaryLog, locates the binlog, installs Microsoft.AITools.BinlogMcp, dumps JSON to /tmp/binlog-data/, and uploads build-failure-analysis-data as an artifact (only on failure). Outputs: outcome, binlog-found, binlog-relative-path. Build step uses continue-on-error so the job itself always succeeds, and exit + "" + \ + "" + prevents a tee glitch from misclassifying a green build as failed.
  • on.needs: [build] makes pre_activation / activation wait for build.
  • Top-level if: needs.build.outputs.outcome == 'failure' gates activation — and the agent cascades to skipped — so the AI pipeline never runs on a green build.
  • The agent job downloads the artifact instead of rebuilding, and only installs NuGet.Mcp.Server (which it uses at runtime for NU#### errors).

For the residual case where the agent still runs (failed builds), I raised the safe-output caps to absorb the harness retry budget:

  • noop.max: 5 (1 happy-path + 4 retry-amplified)
  • add-comment.max: 5 (hide-older-comments: true collapses duplicates)
  • create-pull-request-review-comment.max: 25 (shared body asks for top-5 issues, so 5 × 4 retries = 20)

Slash-command variant

Applied the same split to build-failure-analysis-command.md, plus an explicit refs/pull/<n>/merge checkout in its build job — without it, pull_request_comment events check out the default branch instead of the PR a maintainer ran /analyze-build-failure on.

Verification

  • Compiled both workflows with gh aw compile --strict (gh-aw v0.75.4): 0 errors, 0 warnings.
  • Inspected the generated .lock.yml files to confirm:
    • pre_activation, activation, and agent all gate on needs.build.outputs.outcome == 'failure'.
    • Build job emits the expected outputs and uploads the artifact only on failure.
    • Agent job downloads the artifact and exports GH_AW_BUILD_OUTCOME / GH_AW_BINLOG_PATH correctly.
    • conclusion: job's if: is not affected (only runs when agent did not skip or activation lockdown failed).

Risk

Low. The change is structural — the agent prompt body (shared/build-failure-analysis-shared.md) is unchanged. The build job runs ./build.sh --binaryLog identically to before (same script, same flags) and the AI agent receives identical inputs (same JSON dumps, same env vars). Worst case the new gating is wrong and the agent doesn't run on a failed build — but that just makes the workflow silently no-op, never a regression vs the current red-X state.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

The Build Failure Analysis workflow runs an AI agent on every PR. On a
successful build, the agent's instructions say to call `noop` and stop.
But when the Copilot CLI hits a transient mid-conversation flake, gh-aw's
copilot_harness retries up to 3 times with `--continue`, and each retry
re-emits the terminating `noop` -- producing 4 cumulative noops. The
default `noop.max: 1` then rejects noops 2-4, the agent step exits 1,
and the PR shows a red X on a perfectly green build. PR microsoft#8726 only
suppressed the auto-tracked issue, not the failed check itself.

This change splits the workflow so the build runs in a dedicated `build`
job whose outputs gate the AI pipeline:

* Custom `jobs.build:` runs `./build.sh --binaryLog`, locates the
  binlog, installs Microsoft.AITools.BinlogMcp, dumps JSON to
  `/tmp/binlog-data/`, and uploads `build-failure-analysis-data` as an
  artifact (only on failure). Outputs: `outcome`, `binlog-found`,
  `binlog-relative-path`. Build step uses `continue-on-error` so the
  job itself always succeeds, and `exit ${PIPESTATUS[0]}` ensures a
  `tee` glitch can't misclassify a green build as failed.
* `on.needs: [build]` makes pre_activation/activation wait for `build`.
* Top-level `if: needs.build.outputs.outcome == 'failure'` gates
  activation (cascading to the agent), so the AI pipeline never runs on a
  green build.
* The agent job downloads the artifact instead of rebuilding, and
  installs only NuGet.Mcp.Server (which it uses at runtime for NU####
  errors).

For the residual case where the agent still runs (failed builds), the
safe-output caps are raised to absorb the harness retry budget:
`noop.max: 5`, `add-comment.max: 5`,
`create-pull-request-review-comment.max: 25`. With
`hide-older-comments: true`, duplicate add-comments collapse anyway.

The slash-command variant (`/analyze-build-failure`) gets the same
split, plus an explicit `refs/pull/<n>/merge` checkout in its build job
since `pull_request_comment` events otherwise check out the default
branch.

Fixes the failing runs at
https://github.com/microsoft/testfx/actions/workflows/build-failure-analysis.lock.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 1, 2026 14:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the gh-aw “Build Failure Analysis” workflows so the Copilot agent pipeline is skipped on successful (green) builds, preventing transient AI flakes from surfacing as red ❌ checks on otherwise healthy PRs.

Changes:

  • Split both workflows into a dedicated build job (produces binlog + JSON dump and uploads an artifact on failure) and an agent pipeline gated on needs.build.outputs.outcome == 'failure'.
  • Updated the agent job to download the build artifact instead of rebuilding, and adjusted exported GH_AW_* context accordingly.
  • Increased safe-output caps to tolerate Copilot CLI harness retries without safe-output validation failures.
Show a summary per file
File Description
.github/workflows/build-failure-analysis.md Adds a separate build job and gates activation/agent execution on build failure; switches agent setup to artifact download.
.github/workflows/build-failure-analysis.lock.yml Regenerated lock workflow reflecting new job graph, gating, and safe-output limits.
.github/workflows/build-failure-analysis-command.md Mirrors the same build/agent split for the slash-command workflow, including explicit PR merge-ref checkout.
.github/workflows/build-failure-analysis-command.lock.yml Regenerated lock workflow reflecting the command variant’s new job graph and gating.
.github/aw/actions-lock.json Updates action pin entries used by gh-aw compilation (notably actions/setup-dotnet@v5).

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 4

Comment thread .github/aw/actions-lock.json
Comment thread .github/workflows/build-failure-analysis.lock.yml Outdated
Comment thread .github/workflows/build-failure-analysis-command.lock.yml Outdated
Comment thread .github/workflows/build-failure-analysis.md
… against fork PRs

* Add github/gh-aw-actions/setup@v0.75.4 SHA pin
  (9f050961da586148d135e113d8bb025185cdf2b8) to .github/aw/actions-lock.json.
  Without this entry the gh-aw compiler had no immutable reference and fell
  back to emitting 'uses: github/gh-aw-actions/setup@v0.75.4' with the tag
  recorded as the manifest 'sha', defeating strict-mode action pinning.

* Add a fork guard on the new 'build' job in build-failure-analysis.md
  mirroring the workflow's 'forks: []' trigger filter. The gh-aw frontmatter
  already gates pre_activation/activation/agent on the same expression, but
  user-defined 'jobs:' entries are emitted verbatim, so without an explicit
  'if:' the build job would still run on fork PRs (paying CI time and
  exposing dotnet-tools auth-gated installs to forks) even though the agent
  pipeline never executes for them.

* Recompile both build-failure-analysis*.lock.yml with gh aw 0.75.4 --strict
  so 'github/gh-aw-actions/setup' is pinned to its commit SHA everywhere and
  the build job's new fork guard is reflected in the generated YAML.

The slash-command variant intentionally has no fork guard: '/analyze-build-failure'
is restricted to roles [admin, maintainer, write] and explicitly checks out
'refs/pull/<n>/merge', so a maintainer can rerun the analysis on a fork PR
on demand.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Evangelink Evangelink enabled auto-merge (squash) June 2, 2026 08:09
@Evangelink Evangelink merged commit 6c3d766 into microsoft:main Jun 2, 2026
17 checks passed
@Evangelink Evangelink deleted the dev/amauryleve/build-failure-analysis-skip-on-success branch June 2, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants