Skip to content

[aw-failures] P1: Daily Formal Spec Verifier — Copilot CLI idle-hangs mid-task, killed by 25-min action timeout (5-day 100% outage) #39278

Description

@github-actions

Add a Copilot SDK idle/turn watchdog that aborts (or recovers) a stalled session well before the 25-min action timeout — Formal Spec Verifier hangs after report_intent and has produced nothing for 5 straight days.

Problem statement

The Copilot CLI agent stalls early in its first turn and goes completely silent until the 25-minute action wrapper kills the step. No safe-output is produced and no work completes — a genuine mid-task hang, not a post-completion false-failure.

Affected workflow and run IDs

  • Daily Formal Spec Verifier (.github/workflows/daily-formal-spec-verifier.lock.yml, Copilot engine, model claude-sonnet-4.6) — 5/5 consecutive scheduled failures.
    • Representative (in-window): §27504260264 (2026-06-14T15:58Z).
    • Prior streak: §27471596105, §27428819181, §27362596806, §27290883003 (2026-06-10 → 06-13).
    • Last success comparator: §27220052669 (2026-06-09).

Root cause

The agent ran a few tool calls in its first ~3 minutes, then stopped all activity for ~24 minutes until the action timeout fired:

16:01:23Z  tool.execution_complete  report_intent  success=true
   ... (no further tool calls, no inference progress for ~24 min) ...
16:25:49Z  Process exiting with code: 1
16:25:49Z  ##[error]The action 'Execute GitHub Copilot CLI' has timed out after 25 minutes.

The Copilot SDK driver idle-hung after report_intent and never advanced; audit-diff vs the last success confirms a single 29m41s turn with only 2,430 output tokens / 14 bash calls / 67.2 AIC consumed and no firewall anomalies — i.e. the run burned a full action slot producing nothing. This is the same idle-hang family as #39200 (Dictation Prompt Generator session.idle timeout) but distinct: there the agent finished its work and created the PR before hanging, whereas here it stalls mid-task with no safe-output, so it is a true outage, not a false-failure.

Proposed remediation

  1. Add an inference/turn-level idle watchdog in the Copilot SDK driver that detects no tool-call or token progress for N minutes and aborts (or restarts the turn) with a classified flag (e.g. GH_AW_AGENTIC_ENGINE_IDLE_HANG), rather than letting the 25-min action wrapper be the only backstop.
  2. Investigate why the SDK driver stalls immediately after report_intent on this workflow (possible deadlock between report_intent handling and the next inference request).
  3. As a stopgap, lower this workflow's per-action timeout so a hung run fails fast and frees the runner sooner.

Success criteria / verification

  • Daily Formal Spec Verifier completes (emits its create_issue/noop safe-output) for ≥3 consecutive scheduled runs.
  • No 24-minute post-report_intent silence followed by a 25-min action timeout.
  • An idle-hang, if it recurs, is surfaced as a dedicated classified flag rather than a generic engine timeout.

Parent: #29109. Analyzed runs: 27504260264 (audit), 27220052669 (audit-diff baseline).
Related to #29109

Generated by 🔍 [aw] Failure Investigator (6h) · 457 AIC · ⌖ 12 AIC · ⊞ 4.5K ·

  • expires on Jun 21, 2026, 11:29 AM UTC-08:00

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions