Skip to content

Feature: classify failure causes (infra vs agent) before eport-failure-as-issue triggers #38565

Description

@Evangelink

Summary

safe-outputs.report-failure-as-issue (default true) currently fires on any non-success job status, including transient infra failures that are clearly not actionable by the maintainer (Docker pulls of MCP container images timing out, Copilot/AI provider 5xx, squid firewall startup failures, etc.). For unattended scheduled workflows this produces a steady stream of noise issues that have to be manually closed.

Motivation

We have had to opt three different scheduled workflows out of report-failure-as-issue for exactly this reason:

Workflow Concrete cause Workaround PR
build-failure-analysis Copilot/AI server flake during agent run on otherwise-green PRs microsoft/testfx#8726
adhoc-qa Transient Docker registry timeouts pulling MCP images microsoft/testfx#8734
sub-issue-closer Same — daily scheduled, MCP image pull intermittent microsoft/testfx#9000

Each of those PRs is a one-line report-failure-as-issue: false opt-out, which is the wrong tradeoff: it also silences real agent-side failures that would be worth a tracking issue.

The closest existing tracker I found is #26069 ("Systemic MCP registry 401 failures block all agentic workflow safe outputs"), which is about the symptom but not about the noise classification.

Proposed solution (any of these would help)

  1. Failure classification. Tag each failure with a category (infra-pull, mcp-401, firewall-unhealthy, agent-error, user-script-error, safe-output-validation, …) based on which step failed, and let report-failure-as-issue filter:

    safe-outputs:
      report-failure-as-issue:
        categories: [agent-error, safe-output-validation]   # not infra
  2. Smarter default for scheduled triggers. When on: is schedule (and only schedule), default report-failure-as-issue to false and require explicit opt-in. Scheduled workflows are the noisiest pattern; PR-triggered ones already have the PR-check signal.

  3. Built-in dedupe. Already-open issue for the same (workflow, failure-category) should be deduped/commented instead of opening a new one.

Related

Environment

  • gh-aw: v0.75.x

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions