[aw-failures] [aw] Skillet floods Actions with startup-failures on copilot/* branch pushes (recurring — 73 failed runs / 6h as o
[Content truncated due to length]

### Overview

In the 6h window ending 2026-06-20 08:00 UTC, 21 of 23 agentic-workflow failures (91%) were the `Skillet` workflow (`.github/workflows/skillet.lock.yml`). Every run is an instantaneous startup-failure with no agent activity. No existing `agentic-workflows` issue tracks this signature.

### Key metrics (representative run [§27862450162](https://github.com/github/gh-aw/actions/runs/27862450162))

1. `conclusion = failure`, `event = push`, `headBranch = copilot/update-custom-css-themes`.
2. `Duration = 0`, `ActionMinutes = 0`, `Turns = 0`, `TokenUsage = 0`, `ErrorCount = 0`.
3. `createdAt == startedAt == updatedAt` (06:08:33Z) — the run never progressed past scheduling.
4. Audit classification: "Workflow 'Skillet' failed before agent activation — no error logs were available to analyze."
5. Run-level metadata reports zero failed jobs and zero failed steps — i.e. no job was ever dispatched (startup_failure signature).
6. `audit-diff` firewall comparison across runs 27862450162 / 27859225167 / 27856881609: 0 new domains, 0 status changes, 0 anomalies — a stable, systemic signature with no drift.

### Affected workflow and run IDs

- Workflow: `Skillet` (`.github/workflows/skillet.lock.yml`), `private: true`, `slash_command` (centralized, `name: "*"`).
- 21 failed runs, all `push` events on `copilot/*` branches: 27862450162, 27862210273, 27862038117, 27861738057, 27859225167, 27859181269, 27859173283, 27859099139, 27859022467, 27858955332, 27858921530, 27858868996, 27858836010, 27858612004, 27858594866, 27858560026, 27858531804, 27858098105, 27857778483, 27857320814, 27856881609.

### Probable root cause

The `Skillet` source on `main` triggers **only** on `workflow_dispatch` (centralized slash-command dispatch via `agentic_commands.yml`). The failing runs, however, fire on `event = push` against `copilot/*` feature branches. The most consistent explanation is that those branches carry a **stale / divergent compiled `skillet.lock.yml`** (recompiled on `main` at 2026-06-20 07:59, after all failures) whose `on:` block still matches push events. GitHub schedules that branch-local workflow on each Copilot push; it fails at startup (0s, no jobs, no logs) because the run context does not match a valid slash-command activation. Result: every Copilot agent push emits one red `Skillet` run.

<details>
<summary>Evidence detail</summary>

- `run_summary.json` for 27862450162: `"event": "push"`, `"headBranch": "copilot/update-custom-css-themes"`, `"displayTitle": "Fix docs text contrast and Safari backdrop blur support"`, `Duration 0`, `Turns 0`.
- `main` `skillet.lock.yml` `on:` block contains only `workflow_dispatch` (+ internal `aw_context` input); there is no `push`/`pull_request` trigger, confirming the running config diverges from `main`.
- No firewall, MCP, or tooling anomalies (audit-diff clean) — rules out network/proxy regression.
</details>

### Proposed remediation

1. Confirm the `on:` triggers of `skillet.lock.yml` as it exists on a live `copilot/*` branch (e.g. `copilot/update-custom-css-themes`) and diff against `main`'s compiled lock.
2. If branch locks are stale, recompile/rebase Copilot branches, or have the compiler emit a guard so a centralized slash-command lock cannot be scheduled by `push`/`pull_request` on feature branches (e.g. an early `if` on `github.event_name == 'workflow_dispatch'`).
3. Consider scoping `Skillet` (and other `private` centralized commands) so branch pushes do not schedule them at all, eliminating the startup-failure noise.

### Success criteria / verification

1. Zero `Skillet` runs with `event = push` on `copilot/*` branches over a subsequent 24h window.
2. `Skillet` failure count in the failure-investigator 6h window drops from 21 to ~0 absent a real slash-command failure.
3. Any future `Skillet` failure carries an attributed failed job/step and non-zero duration (i.e. real execution, not startup_failure).

### Existing-issue correlation

- Not a duplicate: no open `agentic-workflows` issue references `Skillet` or this startup-failure signature.
- The other 2 in-window failures are already tracked and still reproducing (kept open): Code Simplifier [§27860467420](https://github.com/github/gh-aw/actions/runs/27860467420) → #40270 (`Execute GitHub Copilot CLI`, BYOK 403); Avenger [§27859038019](https://github.com/github/gh-aw/actions/runs/27859038019) → #40145 (`Parse agent logs`, ERR_CONFIG).

**References:**
- [§27862450162](https://github.com/github/gh-aw/actions/runs/27862450162)
- [§27859225167](https://github.com/github/gh-aw/actions/runs/27859225167)
- [§27856881609](https://github.com/github/gh-aw/actions/runs/27856881609)
Related to #39883







> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/27864940985) · 159.8 AIC · ⌖ 12.7 AIC · ⊞ 4.9K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)
> - [x] expires  on Jun 27, 2026, 12:08 AM UTC-08:00






---

### Update — 2026-06-20 19:13 UTC (re-confirmed, escalating)

**Fix the `Skillet` startup-failure flood on `copilot/*` pushes — this one cluster is now 82% of all agentic-workflow failures (73 of 89 in 6h), up ~3.5× from 21 at first report.** Same signature, no drift: `event = push`, `copilot/*` head branch, 0s duration, zero jobs/steps/logs, `createdAt == updatedAt`. Root cause and remediation below are unchanged and verified still-applicable; this is now the top reliability-noise source in the repo and should be prioritized.

- Representative runs (this window): [§27880980627](https://github.com/github/gh-aw/actions/runs/27880980627) (`copilot/apply-suggestions-from-discussion-40483`), [§27878842425](https://github.com/github/gh-aw/actions/runs/27878842425), [§27877892017](https://github.com/github/gh-aw/actions/runs/27877892017) (`copilot/agent-persona-exploration-another-one`). Spot-checked via `gh run view`: all `event=push`, `createdAt == updatedAt`, 0s duration.
- Still-reproducing tracked neighbors (kept open, no new issue): Avenger `ERR_CONFIG: no structured log entries` → #40145 (5 runs this window, e.g. [§27880644893](https://github.com/github/gh-aw/actions/runs/27880644893)); Daily Issues Report Generator → #40380 (1 run, [§27874987840](https://github.com/github/gh-aw/actions/runs/27874987840)).
- New low-frequency watch item (not yet filed, not escalated): `Daily Security Observability Report` [§27876792899](https://github.com/github/gh-aw/actions/runs/27876792899) — 0-turn agent failure on `main`/`schedule`, 10.4m duration, audit `posture` delta write_capable→read_only vs successful baseline [§27429719048](https://github.com/github/gh-aw/actions/runs/27429719048). Single occurrence in 6h; monitoring one more cycle before filing to avoid duplicating the known 0-turn/ERR_CONFIG class.
- Not real failures this window (excluded, no action): `Smoke CI` (5× `cancelled`) and `Deployment Incident Monitor` (4× `cancelled`) — superseded runs, not agent failures.

---

### Overview

In the 6h window ending 2026-06-20 08:00 UTC, 21 of 23 agentic-workflow failures (91%) were the `Skillet` workflow (`.github/workflows/skillet.lock.yml`). Every run is an instantaneous startup-failure with no agent activity. No existing `agentic-workflows` issue tracks this signature.

### Key metrics (representative run [§27862450162](https://github.com/github/gh-aw/actions/runs/27862450162))

1. `conclusion = failure`, `event = push`, `headBranch = copilot/update-custom-css-themes`.
2. `Duration = 0`, `ActionMinutes = 0`, `Turns = 0`, `TokenUsage = 0`, `ErrorCount = 0`.
3. `createdAt == startedAt == updatedAt` (06:08:33Z) — the run never progressed past scheduling.
4. Audit classification: "Workflow 'Skillet' failed before agent activation — no error logs were available to analyze."
5. Run-level metadata reports zero failed jobs and zero failed steps — i.e. no job was ever dispatched (startup_failure signature).
6. `audit-diff` firewall comparison across runs 27862450162 / 27859225167 / 27856881609: 0 new domains, 0 status changes, 0 anomalies — a stable, systemic signature with no drift.

### Affected workflow and run IDs

- Workflow: `Skillet` (`.github/workflows/skillet.lock.yml`), `private: true`, `slash_command` (centralized, `name: "*"`).
- 21 failed runs, all `push` events on `copilot/*` branches: 27862450162, 27862210273, 27862038117, 27861738057, 27859225167, 27859181269, 27859173283, 27859099139, 27859022467, 27858955332, 27858921530, 27858868996, 27858836010, 27858612004, 27858594866, 27858560026, 27858531804, 27858098105, 27857778483, 27857320814, 27856881609.

### Probable root cause

The `Skillet` source on `main` triggers **only** on `workflow_dispatch` (centralized slash-command dispatch via `agentic_commands.yml`). The failing runs, however, fire on `event = push` against `copilot/*` feature branches. The most consistent explanation is that those branches carry a **stale / divergent compiled `skillet.lock.yml`** (recompiled on `main` at 2026-06-20 07:59, after all failures) whose `on:` block still matches push events. GitHub schedules that branch-local workflow on each Copilot push; it fails at startup (0s, no jobs, no logs) because the run context does not match a valid slash-command activation. Result: every Copilot agent push emits one red `Skillet` run.

<details>
<summary>Evidence detail</summary>

- `run_summary.json` for 27862450162: `"event": "push"`, `"headBranch": "copilot/update-custom-css-themes"`, `"displayTitle": "Fix docs text contrast and Safari backdrop blur support"`, `Duration 0`, `Turns 0`.
- `main` `skillet.lock.yml` `on:` block contains only `workflow_dispatch` (+ internal `aw_context` input); there is no `push`/`pull_request` trigger, confirming the running config diverges from `main`.
- No firewall, MCP, or tooling anomalies (audit-diff clean) — rules out network/proxy regression.
</details>

### Proposed remediation

1. Confirm the `on:` triggers of `skillet.lock.yml` as it exists on a live `copilot/*` branch (e.g. `copilot/update-custom-css-themes`) and diff against `main`'s compiled lock.
2. If branch locks are stale, recompile/rebase Copilot branches, or have the compiler emit a guard so a centralized slash-command lock cannot be scheduled by `push`/`pull_request` on feature branches (e.g. an early `if` on `github.event_name == 'workflow_dispatch'`).
3. Consider scoping `Skillet` (and other `private` centralized commands) so branch pushes do not schedule them at all, eliminating the startup-failure noise.

### Success criteria / verification

1. Zero `Skillet` runs with `event = push` on `copilot/*` branches over a subsequent 24h window.
2. `Skillet` failure count in the failure-investigator 6h window drops from 21 to ~0 absent a real slash-command failure.
3. Any future `Skillet` failure carries an attributed failed job/step and non-zero duration (i.e. real execution, not startup_failure).

### Existing-issue correlation

- Not a duplicate: no open `agentic-workflows` issue references `Skillet` or this startup-failure signature.
- The other 2 in-window failures are already tracked and still reproducing (kept open): Code Simplifier [§27860467420](https://github.com/github/gh-aw/actions/runs/27860467420) → #40270 (`Execute GitHub Copilot CLI`, BYOK 403); Avenger [§27859038019](https://github.com/github/gh-aw/actions/runs/27859038019) → #40145 (`Parse agent logs`, ERR_CONFIG).

**References:**
- [§27862450162](https://github.com/github/gh-aw/actions/runs/27862450162)
- [§27859225167](https://github.com/github/gh-aw/actions/runs/27859225167)
- [§27856881609](https://github.com/github/gh-aw/actions/runs/27856881609)
Related to #39883

> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/27881049283) · 236 AIC · ⌖ 13.3 AIC · ⊞ 4.9K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aw-failures] [aw] Skillet floods Actions with startup-failures on copilot/* branch pushes (recurring — 73 failed runs / 6h as o [Content truncated due to length] #40447

Overview

Key metrics (representative run §27862450162)

Affected workflow and run IDs

Probable root cause

Proposed remediation

Success criteria / verification

Existing-issue correlation

Update — 2026-06-20 19:13 UTC (re-confirmed, escalating)

Overview

Key metrics (representative run §27862450162)

Affected workflow and run IDs

Probable root cause

Proposed remediation

Success criteria / verification

Existing-issue correlation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[aw-failures] [aw] Skillet floods Actions with startup-failures on copilot/* branch pushes (recurring — 73 failed runs / 6h as o [Content truncated due to length] #40447

Description

Overview

Key metrics (representative run §27862450162)

Affected workflow and run IDs

Probable root cause

Proposed remediation

Success criteria / verification

Existing-issue correlation

Update — 2026-06-20 19:13 UTC (re-confirmed, escalating)

Overview

Key metrics (representative run §27862450162)

Affected workflow and run IDs

Probable root cause

Proposed remediation

Success criteria / verification

Existing-issue correlation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions