Parent: #39883 (6h failure investigation report).
Problem statement
The PR Code Quality Reviewer agent job failed at Execute GitHub Copilot CLI after the Copilot SDK driver hung for 14m30s waiting for a session.idle event and hit the 870000 ms idle-timeout watchdog. The agent was actively executing tools (bash/grep/view) right up to the timeout — hasOutput=true, 77.3k tokens, 7 tool types, 1 turn — so this is a hang in the SDK idle-signal path, not an agent-logic failure.
Affected workflow and run IDs
- Workflow:
PR Code Quality Reviewer (.github/workflows/pr-code-quality-reviewer.lock.yml), engine copilot CLI, trigger pull_request, branch copilot/add-codemod-rename-allowed-team-members.
- Failed run: §27852098051 — 2026-06-19 22:56 UTC. duration 19.4m,
turns=1, 77.3k tokens, exit code 1.
- Nearest successful baseline (same config, 12 min earlier):
27851753205 — success, audit classification stable. → intermittent, not a config regression.
Evidence (agent-stdio.log)
[copilot-sdk-driver] [sdk-driver] error: Timeout after 870000ms waiting for session.idle
[copilot-harness] attempt 1: process closed exitCode=1 duration=14m 30s stdout=0B stderr=505092B hasOutput=true
[copilot-harness] attempt 1 failed: exitCode=1 failureClass=sdk_session_idle_timeout ... isSDKSessionIdleTimeoutError=true ... retriesRemaining=3
Probable root cause
The SDK driver waits for a session.idle event that never arrives while the agent is mid-execution: the final tool calls complete but no idle signal is emitted, so the 870s watchdog fires and exits 1. This is distinct from #39946 (where all classifiers are false) — here the failure is explicitly classified sdk_session_idle_timeout. Likely a missed/dropped idle event in the SDK driver on long single-turn runs.
Proposed remediation
- Investigate why
session.idle is not emitted after the final tool completes on long turns; resolve the driver on terminal tool completion, not only on an idle event.
- The attempt reports
retriesRemaining=3 yet the run ended in failure — confirm the sdk_session_idle_timeout retry path actually re-runs the turn; if not, fix it.
- Preserve and surface the partial agent output on idle-timeout instead of discarding it.
Success criteria / verification
Generated by 🔍 [aw] Failure Investigator (6h) · 172.3 AIC · ⌖ 11.7 AIC · ⊞ 4.9K · ◷
Closing — resolved (fresh evidence 2026-06-22)
PR Code Quality Reviewer ran green across the 6 most recent runs on 2026-06-22 (02:00–06:25Z, pull_request). The Copilot SDK session.idle 870s timeout no longer reproduces.
- Recent runs (6): all
success, including 2026-06-22 06:25Z.
- No
session.idle timeout observed in current run history.
Closing as fixed. Reopen if the idle-timeout reappears.
Auto-closed by 6h Failure Investigator after correlating fresh run history.
Generated by 🔍 [aw] Failure Investigator (6h) · 253.8 AIC · ⊞ 4.9K · ◷
Parent: #39883 (6h failure investigation report).
Problem statement
The
PR Code Quality Revieweragent job failed atExecute GitHub Copilot CLIafter the Copilot SDK driver hung for 14m30s waiting for asession.idleevent and hit the870000ms idle-timeout watchdog. The agent was actively executing tools (bash/grep/view) right up to the timeout —hasOutput=true, 77.3k tokens, 7 tool types, 1 turn — so this is a hang in the SDK idle-signal path, not an agent-logic failure.Affected workflow and run IDs
PR Code Quality Reviewer(.github/workflows/pr-code-quality-reviewer.lock.yml), enginecopilotCLI, triggerpull_request, branchcopilot/add-codemod-rename-allowed-team-members.turns=1, 77.3k tokens, exit code 1.27851753205— success, audit classificationstable. → intermittent, not a config regression.Evidence (agent-stdio.log)
Probable root cause
The SDK driver waits for a
session.idleevent that never arrives while the agent is mid-execution: the final tool calls complete but no idle signal is emitted, so the 870s watchdog fires and exits 1. This is distinct from #39946 (where all classifiers are false) — here the failure is explicitly classifiedsdk_session_idle_timeout. Likely a missed/dropped idle event in the SDK driver on long single-turn runs.Proposed remediation
session.idleis not emitted after the final tool completes on long turns; resolve the driver on terminal tool completion, not only on an idle event.retriesRemaining=3yet the run ended in failure — confirm thesdk_session_idle_timeoutretry path actually re-runs the turn; if not, fix it.Success criteria / verification
sdk_session_idle_timeoutruns with output either succeed via retry or are reported as a degraded (warning) outcome, not a hard agent-job failure.Related to [aw-failures] [aw] Failure Investigation Report — 6h window (2026-06-17 19:34 UTC) #39883
Closing — resolved (fresh evidence 2026-06-22)
PR Code Quality Reviewer ran green across the 6 most recent runs on 2026-06-22 (02:00–06:25Z, pull_request). The Copilot SDK
session.idle870s timeout no longer reproduces.success, including 2026-06-22 06:25Z.session.idletimeout observed in current run history.Closing as fixed. Reopen if the idle-timeout reappears.
Auto-closed by 6h Failure Investigator after correlating fresh run history.