fix(sdk-driver): resolve idle-timeout as success when output collected and all tools complete by Copilot · Pull Request #40419 · github/gh-aw

Copilot · 2026-06-20T03:00:51Z

The Copilot SDK driver would classify a run as failed (exit 1) whenever sendAndWait hit the 870s session.idle watchdog — even when the agent had produced output and all tool calls had already completed. The SDK simply dropped the idle signal on long single-turn runs; the work was done but the driver discarded it.

Changes

copilot_sdk_session.cjs

Idle-timeout → success: when sendAndWait throws a timeout error, hasOutput=true, and pendingToolCalls.size === 0 (all tracked tool calls finished), return exitCode: 0 with the accumulated output instead of failing.
Fix hasOutput bug: the catch block hardcoded hasOutput: false; now uses the actual accumulated value so the harness gets correct retry signals for cases that are genuine failures.
Preserve output on error: output in the error return now uses the accumulated assistant-message content instead of discarding it in favour of the error string.
Moved pendingToolCalls to function scope (was only accessible inside try) so the catch block can inspect in-flight tool state.
Extracted SDK_IDLE_TIMEOUT_PATTERN constant matching the specific format used in copilot_harness.cjs, replacing a looser inline regex.

// Before — always discarded output on any sendAndWait error
} catch (err) {
  return { exitCode: 1, output: failure.message, hasOutput: false, durationMs };
}

// After — idle-timeout with completed work resolves as success
const isIdleTimeout = !catastrophicToolDenialsError && SDK_IDLE_TIMEOUT_PATTERN.test(failure.message);
if (isIdleTimeout && hasOutput && pendingToolCalls.size === 0) {
  log(`warning: SDK idle-timeout with collected output and no pending tool calls — treating as completed`);
  return { exitCode: 0, output, hasOutput, durationMs };
}
return { exitCode: 1, output: hasOutput ? output : failure.message, hasOutput, durationMs };

copilot_sdk_driver.test.cjs — three regression tests:

idle-timeout + output + all tool calls complete → exitCode: 0
idle-timeout + pending tool calls → exitCode: 1 (harness retries)
idle-timeout + no output → exitCode: 1 (nothing to surface)

…and no pending tools When session.sendAndWait times out waiting for session.idle but the agent produced output and all tracked tool calls have already completed, the SDK simply failed to emit the idle signal — the agent's work is done. Resolve with exitCode 0 instead of 1 so the harness does not classify the run as a failure or waste retry budget. Also fix the catch block to: - Use the accumulated hasOutput value (was hardcoded false) - Preserve the accumulated output in the error return (was discarded) Adds three regression tests covering: - idle-timeout with output + all tool calls complete → exitCode 0 (success) - idle-timeout with pending tool calls → exitCode 1 (genuine failure, retry) - idle-timeout with no output collected → exitCode 1 (no work done) Closes #40418 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Extract the inline idle-timeout regex into a named module-level constant SDK_IDLE_TIMEOUT_PATTERN, matching the more specific format already used in copilot_harness.cjs (SDK_SESSION_IDLE_TIMEOUT_PATTERN). This makes the intent clear and is easy to update if the SDK error message format changes. Also export the constant for testability. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-06-20T03:24:02Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #40419 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold). Only 2 files changed, neither in tracked business-logic paths.

github-actions · 2026-06-20T03:24:03Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-20T03:24:04Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-06-20T03:24:06Z

✅ Test Quality Sentinel completed test quality analysis.

Copilot

Pull request overview

This PR adjusts the Copilot SDK session driver so that a sendAndWait idle-timeout (waiting for session.idle) is treated as a successful completion when the agent has already produced output and there are no tracked in-flight tool calls, preventing long single-turn runs from being incorrectly marked as failures.

Changes:

Add SDK_IDLE_TIMEOUT_PATTERN and use it in runWithCopilotSDK to detect the specific SDK session.idle timeout message.
On idle-timeout errors, return exitCode: 0 when hasOutput === true and pendingToolCalls.size === 0; otherwise preserve accumulated output and return exitCode: 1 with correct hasOutput.
Add regression tests covering the success case and two failure cases (pending tool calls, no output).

Show a summary per file

File	Description
actions/setup/js/copilot_sdk_session.cjs	Tracks pending tool calls across `try/catch`, preserves accumulated output on error, and treats SDK idle-timeout as success when work is complete.
actions/setup/js/copilot_sdk_driver.test.cjs	Adds regression coverage for idle-timeout behavior across success/failure scenarios.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions · 2026-06-20T03:29:54Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 90/100 — Excellent

Analyzed 3 test(s) across 1 JavaScript file: 3 design tests, 0 implementation tests, 0 guideline violations.

📊 Metrics & Test Classification (3 tests analyzed)

Metric	Value
New/modified tests analyzed	3
✅ Design tests (behavioral contracts)	3 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	3 (100%)
Duplicate test clusters	0
Test inflation detected	⚠️ Yes — `copilot_sdk_driver.test.cjs` +143 lines vs `copilot_sdk_session.cjs` +30 lines (ratio ≈ 4.8:1)
🚨 Coding-guideline violations	0

Test	File	Classification	Issues Detected
`resolves exitCode 0 on SDK idle-timeout when output collected and all tool calls complete`	`copilot_sdk_driver.test.cjs`	✅ Design	—
`returns exitCode 1 on SDK idle-timeout when tool calls are still pending`	`copilot_sdk_driver.test.cjs`	✅ Design	—
`returns exitCode 1 on SDK idle-timeout with no output collected`	`copilot_sdk_driver.test.cjs`	✅ Design	—

Go: 0 (*_test.go); JavaScript: 3 (*.test.cjs). No other languages detected.

i️ Note on Test Inflation (non-blocking)

The test file added 143 lines versus 30 lines in the production file (≈ 4.8:1 ratio, threshold 2:1). Per the rubric this deducts 10 points from the Proportional Growth component. In context this is expected and justified: the production fix is compact (pattern match + conditional in the catch block) but requires three distinct test scenarios to cover the state machine — success (all tools complete + output), failure (pending tools), and failure (no output). No action is required, but it is noted per the scoring rules.

Verdict

✅ Check passed. 0% implementation tests (threshold: 30%). All three tests verify observable behavioral contracts — the exitCode and hasOutput values returned by runWithCopilotSDK under three distinct idle-timeout scenarios — and each covers an error/edge case. Mocking targets (CopilotClient, SDK session) are external I/O dependencies, which is the accepted pattern for this codebase.

🧪 Test quality analysis by Test Quality Sentinel · 54.4 AIC · ⌖ 8.79 AIC · ⊞ 8.3K · ◷

github-actions

✅ Test Quality Sentinel: 90/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 tests verify observable behavioral contracts (exitCode and hasOutput under distinct idle-timeout scenarios). Mocking targets are external SDK I/O dependencies.

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving with minor suggestions on test completeness and a cross-file DRY opportunity.

📋 Key Themes & Highlights

Key Themes

Cross-file regex duplication: SDK_IDLE_TIMEOUT_PATTERN (new) and SDK_SESSION_IDLE_TIMEOUT_PATTERN (existing in copilot_harness.cjs) are identical. The constant is now exported — the harness could import it to eliminate the "keep in sync" burden.
Test specification gaps: The three regression tests are well-structured, but the two failure paths don't assert cleanup (disconnect/stop) and the no-output test leaves output content unspecified.
Untested guard: The !catastrophicToolDenialsError condition that prevents tool-denial failures from being promoted to success has no dedicated test.

Positive Highlights

✅ Root cause addressed cleanly: hoisting pendingToolCalls to function scope is the minimal structural change needed — nothing was added unnecessarily.
✅ Three-condition guard (isIdleTimeout && hasOutput && pendingToolCalls.size === 0) is precise and well-reasoned; each condition maps directly to a test case.
✅ hasOutput and output fix on the error path are bonuses that correct silent data loss and wrong retry signals.
✅ Mock event sequences (start/complete pairing) in the tests exercise real state transitions, not just surface throw/catch behaviour.
✅ The PR description is excellent — the before/after code block makes the change immediately graspable.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 73.3 AIC · ⌖ 10.6 AIC · ⊞ 6.9K

github-actions · 2026-06-20T03:31:22Z

+// deadline waiting for the session.idle event.  This matches the message format
+// "Timeout after <N>ms waiting for session.idle" produced by the Copilot SDK.
+// Keep in sync with SDK_SESSION_IDLE_TIMEOUT_PATTERN in copilot_harness.cjs.
+const SDK_IDLE_TIMEOUT_PATTERN = /Timeout after \d+ms waiting for session\.idle/;


[/diagnose] SDK_IDLE_TIMEOUT_PATTERN is identical to SDK_SESSION_IDLE_TIMEOUT_PATTERN already defined in copilot_harness.cjs (line 120). The "Keep in sync" comment flags a live drift risk — the two patterns can diverge silently.

💡 Suggestion: import from here instead of duplicating

Since SDK_IDLE_TIMEOUT_PATTERN is now exported, copilot_harness.cjs could import it directly:

// copilot_harness.cjs const { SDK_IDLE_TIMEOUT_PATTERN } = require("./copilot_sdk_session.cjs"); // ... replace SDK_SESSION_IDLE_TIMEOUT_PATTERN usages with SDK_IDLE_TIMEOUT_PATTERN

This eliminates the "keep in sync" maintenance burden and makes the single source of truth explicit.

github-actions · 2026-06-20T03:31:22Z

+    // output and all tracked tool calls have already completed, the session work is
+    // done — the SDK simply failed to emit the idle signal.  Treat it as a successful
+    // run so the harness does not classify it as a failure or waste retry attempts.
+    const isIdleTimeout = !catastrophicToolDenialsError && SDK_IDLE_TIMEOUT_PATTERN.test(failure.message);


[/tdd] The !catastrophicToolDenialsError guard is a critical safety rail — it prevents a tool-denial failure whose error message accidentally matches the idle-timeout pattern from being promoted to success. But this branch is not covered by a regression test.

💡 Suggested test case

it("returns exitCode 1 when catastrophic tool denials fires even if timeout message matches", async () => { // Simulate maxToolDenials being hit; the driver should NOT treat this as // an idle-timeout success even if the error text happens to match. const session = { sessionId: "session-denial-not-idle", on: handler => { onEvent = handler; }, sendAndWait: vi.fn().mockImplementation(async () => { // fire enough tool-denial events to breach the threshold, then stall throw new Error("Timeout after 870000ms waiting for session.idle"); }), disconnect: vi.fn().mockResolvedValue(undefined), }; // ... configure maxToolDenials:1 and fire denial events ... expect(result.exitCode).toBe(1); });

Without this test, a future refactor that accidentally drops the !catastrophicToolDenialsError condition would go undetected.

github-actions · 2026-06-20T03:31:22Z

+      });
+
+      expect(result.exitCode).toBe(1);
+      expect(result.hasOutput).toBe(false);


[/tdd] This test verifies exitCode: 1 and hasOutput: false but doesn't assert what output contains. When hasOutput is false, the implementation returns failure.message — so the harness receives the timeout error string, not an empty value. Asserting this fully specifies the behaviour and prevents a silent regression if the fallback is changed.

💡 Suggested addition

// Add after the existing expects: expect(result.output).toContain("waiting for session.idle"); // confirms the error string is surfaced to the harness for diagnostics

github-actions · 2026-06-20T03:31:22Z

+
+      expect(result.exitCode).toBe(1);
+      expect(result.hasOutput).toBe(true);
+      expect(result.output).toContain("working on it");


[/tdd] The pending-tools and no-output failure tests don't assert that cleanup (disconnect, stop) was called, unlike the success-case test above. The finally block should run on all exit paths — adding these assertions prevents a future regression where error handling disrupts cleanup.

💡 Suggested additions (also applies to the no-output test)

// Add to both failure tests: expect(disconnect).toHaveBeenCalledTimes(1); expect(stop).toHaveBeenCalledTimes(1);

github-actions

🔎 Code quality review by PR Code Quality Reviewer · 153.4 AIC · ⌖ 10.2 AIC · ⊞ 5.1K

github-actions · 2026-06-20T03:38:27Z

+// deadline waiting for the session.idle event.  This matches the message format
+// "Timeout after <N>ms waiting for session.idle" produced by the Copilot SDK.
+// Keep in sync with SDK_SESSION_IDLE_TIMEOUT_PATTERN in copilot_harness.cjs.
+const SDK_IDLE_TIMEOUT_PATTERN = /Timeout after \d+ms waiting for session\.idle/;


Exported pattern is not consumed by its stated peer — copilot_harness.cjs still carries its own copy. Two independent regexes can silently drift.

💡 Details

The comment says "Keep in sync with SDK_SESSION_IDLE_TIMEOUT_PATTERN in copilot_harness.cjs", implying the export is meant to be a single source of truth. But copilot_harness.cjs defines its own copy at line 120 and never imports from this module:

// copilot_harness.cjs line 120 — still exists, still independent const SDK_SESSION_IDLE_TIMEOUT_PATTERN = /Timeout after \d+ms waiting for session\.idle/;

The export is semantically incomplete.

There is also a diagnostic side-effect: for the idle-timeout + pending-tools + output path, this PR correctly returns output = <assistant content> rather than the error string. That means isSDKSessionIdleTimeoutError(result.output) in the harness will now return false for those cases, so classifyCopilotFailure logs partial_execution instead of sdk_session_idle_timeout. Retry logic is unaffected (keyed on hasOutput), but failure traces become less specific.

Suggested fix: import from this module in copilot_harness.cjs:

const { SDK_IDLE_TIMEOUT_PATTERN } = require('./copilot_sdk_session.cjs'); // retain old name via alias if needed: const SDK_SESSION_IDLE_TIMEOUT_PATTERN = SDK_IDLE_TIMEOUT_PATTERN;

github-actions · 2026-06-20T03:38:27Z

+
+      expect(result.exitCode).toBe(1);
+      expect(result.hasOutput).toBe(true);
+      expect(result.output).toContain("working on it");


Tests 2 and 3 don't assert that disconnect and stop are called, leaving the error-path cleanup unverified. Test 1 (success path) asserts both; the failure paths should too.

💡 Suggested additions

For the pending-tools test (after line 230):

expect(disconnect).toHaveBeenCalledTimes(1); expect(stop).toHaveBeenCalledTimes(1);

For the no-output test (after line 262), the same assertions apply, and it's also worth adding:

expect(result.output).toBe('Timeout after 870000ms waiting for session.idle');

to lock in the new behaviour where output = failure.message when hasOutput is false (the fix this PR introduces — currently this invariant has no test coverage).

Without the cleanup assertions, a regression that skips disconnect/stop in the catch path would not be caught by these two tests.

github-actions · 2026-06-20T06:40:00Z

@copilot review all comments and address unresolved review feedback.

Generated by 👨‍🍳 PR Sous Chef · 87.7 AIC · ⌖ 1.04 AIC · ⊞ 17.2K · ◷

github-actions · 2026-06-20T06:40:02Z

Fix the open review threads: dedupe the timeout regex and add cleanup/output assertions to the failure tests.

Generated by 👨‍🍳 PR Sous Chef · 87.7 AIC · ⌖ 1.04 AIC · ⊞ 17.2K · ◷

github-actions · 2026-06-20T08:06:07Z

@copilot review all comments and address unresolved review feedback.
Please address the open review threads, especially the regex duplication and failure-path cleanup/output assertions, then re-request review.

Generated by 👨‍🍳 PR Sous Chef · 51.6 AIC · ⌖ 1.07 AIC · ⊞ 17.2K · ◷

github-actions · 2026-06-20T10:28:41Z

@copilot review all comments and address the unresolved review feedback.
pr-sous-chef run: https://github.com/github/gh-aw/actions/runs/27868244120

Generated by 👨‍🍳 PR Sous Chef · 57.2 AIC · ⌖ 1.06 AIC · ⊞ 17.2K · ◷

github-actions · 2026-06-20T11:25:10Z

@copilot review all comments and address unresolved review feedback. Then refresh the branch, rerun checks, and summarize anything still blocking mergeability.

Generated by 👨‍🍳 PR Sous Chef · 72.3 AIC · ⌖ 1.04 AIC · ⊞ 17.2K · ◷

Initial plan

267c5d5

Copilot AI assigned Copilot and pelikhan Jun 20, 2026

Copilot started work on behalf of pelikhan June 20, 2026 03:01 View session

Copilot AI and others added 2 commits June 20, 2026 03:15

Copilot AI changed the title ~~[WIP] Fix Copilot SDK session.idle timeout issue in PR Code Quality Reviewer~~ fix(sdk-driver): resolve idle-timeout as success when output collected and all tools complete Jun 20, 2026

Copilot AI requested a review from pelikhan June 20, 2026 03:17

Copilot finished work on behalf of pelikhan June 20, 2026 03:17

pelikhan marked this pull request as ready for review June 20, 2026 03:22

Copilot AI review requested due to automatic review settings June 20, 2026 03:22

Copilot started reviewing on behalf of pelikhan June 20, 2026 03:23 View session

Copilot AI reviewed Jun 20, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 20, 2026

[aw] No-Op Runs #39849

Open

github-actions Bot approved these changes Jun 20, 2026

View reviewed changes

github-actions Bot reviewed Jun 20, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 20, 2026

[team-status] Team Daily Status — 2026-06-20 #40461

Closed

pelikhan merged commit 7bb0ca2 into main Jun 20, 2026
54 of 62 checks passed

pelikhan deleted the copilot/aw-failures-fix-sdk-idle-timeout branch June 20, 2026 14:47

Conversation

Copilot AI commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

github-actions Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 20, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 20, 2026 •

edited

Loading

github-actions Bot commented Jun 20, 2026 •

edited

Loading

github-actions Bot commented Jun 20, 2026 •

edited

Loading

github-actions Bot commented Jun 20, 2026 •

edited

Loading

github-actions Bot commented Jun 20, 2026 •

edited

Loading