Skip to content

[aw-failures] [aw] Skillet floods Actions with startup-failures on copilot/* branch pushes (recurring — 73 failed runs / 6h as o [Content truncated due to length] #40447

@github-actions

Description

@github-actions

Overview

In the 6h window ending 2026-06-20 08:00 UTC, 21 of 23 agentic-workflow failures (91%) were the Skillet workflow (.github/workflows/skillet.lock.yml). Every run is an instantaneous startup-failure with no agent activity. No existing agentic-workflows issue tracks this signature.

Key metrics (representative run §27862450162)

  1. conclusion = failure, event = push, headBranch = copilot/update-custom-css-themes.
  2. Duration = 0, ActionMinutes = 0, Turns = 0, TokenUsage = 0, ErrorCount = 0.
  3. createdAt == startedAt == updatedAt (06:08:33Z) — the run never progressed past scheduling.
  4. Audit classification: "Workflow 'Skillet' failed before agent activation — no error logs were available to analyze."
  5. Run-level metadata reports zero failed jobs and zero failed steps — i.e. no job was ever dispatched (startup_failure signature).
  6. audit-diff firewall comparison across runs 27862450162 / 27859225167 / 27856881609: 0 new domains, 0 status changes, 0 anomalies — a stable, systemic signature with no drift.

Affected workflow and run IDs

  • Workflow: Skillet (.github/workflows/skillet.lock.yml), private: true, slash_command (centralized, name: "*").
  • 21 failed runs, all push events on copilot/* branches: 27862450162, 27862210273, 27862038117, 27861738057, 27859225167, 27859181269, 27859173283, 27859099139, 27859022467, 27858955332, 27858921530, 27858868996, 27858836010, 27858612004, 27858594866, 27858560026, 27858531804, 27858098105, 27857778483, 27857320814, 27856881609.

Probable root cause

The Skillet source on main triggers only on workflow_dispatch (centralized slash-command dispatch via agentic_commands.yml). The failing runs, however, fire on event = push against copilot/* feature branches. The most consistent explanation is that those branches carry a stale / divergent compiled skillet.lock.yml (recompiled on main at 2026-06-20 07:59, after all failures) whose on: block still matches push events. GitHub schedules that branch-local workflow on each Copilot push; it fails at startup (0s, no jobs, no logs) because the run context does not match a valid slash-command activation. Result: every Copilot agent push emits one red Skillet run.

Evidence detail
  • run_summary.json for 27862450162: "event": "push", "headBranch": "copilot/update-custom-css-themes", "displayTitle": "Fix docs text contrast and Safari backdrop blur support", Duration 0, Turns 0.
  • main skillet.lock.yml on: block contains only workflow_dispatch (+ internal aw_context input); there is no push/pull_request trigger, confirming the running config diverges from main.
  • No firewall, MCP, or tooling anomalies (audit-diff clean) — rules out network/proxy regression.

Proposed remediation

  1. Confirm the on: triggers of skillet.lock.yml as it exists on a live copilot/* branch (e.g. copilot/update-custom-css-themes) and diff against main's compiled lock.
  2. If branch locks are stale, recompile/rebase Copilot branches, or have the compiler emit a guard so a centralized slash-command lock cannot be scheduled by push/pull_request on feature branches (e.g. an early if on github.event_name == 'workflow_dispatch').
  3. Consider scoping Skillet (and other private centralized commands) so branch pushes do not schedule them at all, eliminating the startup-failure noise.

Success criteria / verification

  1. Zero Skillet runs with event = push on copilot/* branches over a subsequent 24h window.
  2. Skillet failure count in the failure-investigator 6h window drops from 21 to ~0 absent a real slash-command failure.
  3. Any future Skillet failure carries an attributed failed job/step and non-zero duration (i.e. real execution, not startup_failure).

Existing-issue correlation

References:

Generated by 🔍 [aw] Failure Investigator (6h) · 159.8 AIC · ⌖ 12.7 AIC · ⊞ 4.9K ·

  • expires on Jun 27, 2026, 12:08 AM UTC-08:00

Update — 2026-06-20 19:13 UTC (re-confirmed, escalating)

Fix the Skillet startup-failure flood on copilot/* pushes — this one cluster is now 82% of all agentic-workflow failures (73 of 89 in 6h), up ~3.5× from 21 at first report. Same signature, no drift: event = push, copilot/* head branch, 0s duration, zero jobs/steps/logs, createdAt == updatedAt. Root cause and remediation below are unchanged and verified still-applicable; this is now the top reliability-noise source in the repo and should be prioritized.


Overview

In the 6h window ending 2026-06-20 08:00 UTC, 21 of 23 agentic-workflow failures (91%) were the Skillet workflow (.github/workflows/skillet.lock.yml). Every run is an instantaneous startup-failure with no agent activity. No existing agentic-workflows issue tracks this signature.

Key metrics (representative run §27862450162)

  1. conclusion = failure, event = push, headBranch = copilot/update-custom-css-themes.
  2. Duration = 0, ActionMinutes = 0, Turns = 0, TokenUsage = 0, ErrorCount = 0.
  3. createdAt == startedAt == updatedAt (06:08:33Z) — the run never progressed past scheduling.
  4. Audit classification: "Workflow 'Skillet' failed before agent activation — no error logs were available to analyze."
  5. Run-level metadata reports zero failed jobs and zero failed steps — i.e. no job was ever dispatched (startup_failure signature).
  6. audit-diff firewall comparison across runs 27862450162 / 27859225167 / 27856881609: 0 new domains, 0 status changes, 0 anomalies — a stable, systemic signature with no drift.

Affected workflow and run IDs

  • Workflow: Skillet (.github/workflows/skillet.lock.yml), private: true, slash_command (centralized, name: "*").
  • 21 failed runs, all push events on copilot/* branches: 27862450162, 27862210273, 27862038117, 27861738057, 27859225167, 27859181269, 27859173283, 27859099139, 27859022467, 27858955332, 27858921530, 27858868996, 27858836010, 27858612004, 27858594866, 27858560026, 27858531804, 27858098105, 27857778483, 27857320814, 27856881609.

Probable root cause

The Skillet source on main triggers only on workflow_dispatch (centralized slash-command dispatch via agentic_commands.yml). The failing runs, however, fire on event = push against copilot/* feature branches. The most consistent explanation is that those branches carry a stale / divergent compiled skillet.lock.yml (recompiled on main at 2026-06-20 07:59, after all failures) whose on: block still matches push events. GitHub schedules that branch-local workflow on each Copilot push; it fails at startup (0s, no jobs, no logs) because the run context does not match a valid slash-command activation. Result: every Copilot agent push emits one red Skillet run.

Evidence detail
  • run_summary.json for 27862450162: "event": "push", "headBranch": "copilot/update-custom-css-themes", "displayTitle": "Fix docs text contrast and Safari backdrop blur support", Duration 0, Turns 0.
  • main skillet.lock.yml on: block contains only workflow_dispatch (+ internal aw_context input); there is no push/pull_request trigger, confirming the running config diverges from main.
  • No firewall, MCP, or tooling anomalies (audit-diff clean) — rules out network/proxy regression.

Proposed remediation

  1. Confirm the on: triggers of skillet.lock.yml as it exists on a live copilot/* branch (e.g. copilot/update-custom-css-themes) and diff against main's compiled lock.
  2. If branch locks are stale, recompile/rebase Copilot branches, or have the compiler emit a guard so a centralized slash-command lock cannot be scheduled by push/pull_request on feature branches (e.g. an early if on github.event_name == 'workflow_dispatch').
  3. Consider scoping Skillet (and other private centralized commands) so branch pushes do not schedule them at all, eliminating the startup-failure noise.

Success criteria / verification

  1. Zero Skillet runs with event = push on copilot/* branches over a subsequent 24h window.
  2. Skillet failure count in the failure-investigator 6h window drops from 21 to ~0 absent a real slash-command failure.
  3. Any future Skillet failure carries an attributed failed job/step and non-zero duration (i.e. real execution, not startup_failure).

Existing-issue correlation

References:

Generated by 🔍 [aw] Failure Investigator (6h) · 236 AIC · ⌖ 13.3 AIC · ⊞ 4.9K ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions