Fix flaky SDK E2E tests#1379
Conversation
Avoid waiting for full assistant completions in tool-filter tests when the assertions only need captured CAPI requests. Replace fake echo MCP server configs with the shared Node MCP test server and wait for MCP connectivity before prompts in real MCP E2E tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply Ruff formatting to the new exchange polling helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates multi-language SDK E2E tests to remove flakiness caused by waiting on full assistant completions when only request/connection state matters, and replaces non-portable stdio MCP stand-ins (echo/true) with the shared Node-based MCP test server plus explicit “connected” polling.
Changes:
- Switch tool-filtering assertions (available/excluded/default-agent excluded tools) to poll captured CAPI exchanges instead of awaiting assistant completion.
- Update MCP server E2E coverage to run a real stdio MCP subprocess (
test/harness/test-mcp-server.mjs) and wait for CONNECTED status before asserting/continuing. - Trim resume snapshots/tests to validate resumed session identity and MCP connectivity without requiring an additional model turn.
Show a summary per file
| File | Description |
|---|---|
| test/snapshots/mcp_and_agents/should_accept_mcp_server_configuration_on_session_resume.yaml | Removes second prompt/response turn so resume test no longer depends on an extra model call. |
| test/snapshots/mcp_and_agents/accept_mcp_server_config_on_resume.yaml | Same as above for Go subtest snapshot naming. |
| rust/tests/e2e/session.rs | Tool-filter tests now send without waiting and assert against captured CAPI exchanges. |
| rust/tests/e2e/rpc_mcp_and_skills.rs | Uses shared Node MCP test server script with explicit harness working directory. |
| python/e2e/testharness/context.py | Adds wait_for_exchanges() helper to poll for captured exchanges deterministically. |
| python/e2e/test_session_e2e.py | Tool-filter tests now wait for captured exchanges and always disconnect sessions. |
| python/e2e/test_session_config_e2e.py | Resume/tool assertions now wait for captured exchanges instead of assistant completion. |
| python/e2e/test_rpc_mcp_and_skills_e2e.py | Runs real MCP stdio server + waits for CONNECTED before MCP assertions. |
| python/e2e/test_mcp_and_agents_e2e.py | Runs real MCP stdio server + waits for CONNECTED; trims resume assertion to connectivity. |
| nodejs/test/e2e/session.e2e.test.ts | Tool-filter tests now poll proxy exchanges (retry) and ensure session disconnect in finally. |
| nodejs/test/e2e/session_config.e2e.test.ts | Resume/tool assertions now poll proxy exchanges instead of waiting on assistant completion. |
| nodejs/test/e2e/rpc_mcp_and_skills.e2e.test.ts | Uses shared MCP test server + waits for MCP CONNECTED before listing/asserting. |
| nodejs/test/e2e/mcp_and_agents.e2e.test.ts | Uses shared MCP test server + waits for MCP CONNECTED; trims resume flow to connectivity. |
| java/src/test/java/com/github/copilot/sdk/McpAndAgentsTest.java | Uses shared MCP test server and polls MCP status to CONNECTED; trims resume to connectivity. |
| java/src/test/java/com/github/copilot/sdk/E2ETestContext.java | Exposes getRepoRoot() so tests can locate shared harness assets. |
| go/internal/e2e/testharness/context.go | Adds WaitForExchanges polling helper for deterministic exchange capture. |
| go/internal/e2e/session_e2e_test.go | Tool-filter tests now assert via captured exchanges and use t.Cleanup for disconnect. |
| go/internal/e2e/session_config_e2e_test.go | Resume/tool assertion now sends without waiting and polls captured exchanges. |
| go/internal/e2e/rpc_mcp_and_skills_e2e_test.go | Uses shared MCP test server + waits for MCP CONNECTED before assertions. |
| go/internal/e2e/mcp_server_helpers_test.go | Adds shared helpers for MCP stdio server config + CONNECTED polling. |
| go/internal/e2e/mcp_and_agents_e2e_test.go | Uses shared MCP helpers; trims resume to connectivity and waits for CONNECTED where relevant. |
| dotnet/test/Harness/E2ETestBase.cs | Centralizes exchange polling + shared MCP server config discovery + MCP CONNECTED polling. |
| dotnet/test/E2E/SessionMcpAndAgentConfigE2ETests.cs | Uses shared MCP helpers; trims resume to connectivity and removes assistant-dependent assertions. |
| dotnet/test/E2E/SessionE2ETests.cs | Tool-filter assertions now rely on captured exchanges via helper instead of assistant completion. |
| dotnet/test/E2E/SessionConfigE2ETests.cs | Resume/tool assertion now uses captured exchange helper and ensures dispose in finally. |
| dotnet/test/E2E/RpcMcpAndSkillsE2ETests.cs | Uses shared MCP server config + waits for CONNECTED before MCP RPC assertions. |
| dotnet/test/E2E/PreMcpToolCallHookE2ETests.cs | Renames local harness-dir finder to avoid confusion with shared MCP harness locator. |
Copilot's findings
- Files reviewed: 27/27 changed files
- Comments generated: 0
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Use SendAndWaitAsync so the test subscribes for response events before sending the prompt, and dispose the session after the custom config dir check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Start the send-and-wait operation before awaiting permission callbacks so response events are observed by the operation's subscription. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review ✅This PR modifies only test infrastructure across all language SDKs (dotnet, go, java, nodejs, python, rust) — no public SDK API surface changes were made. The test fixes (polling CAPI exchanges for tool-filter assertions, using the shared Node MCP test server, trimming snapshot-sensitive resume tests) are applied consistently across all SDK implementations. No cross-language consistency issues were identified.
|
Live runtime CI was timing out in SDK E2E tests for reasons unrelated to SDK behavior: tool-filtering cases waited for full assistant responses when only the outbound CAPI request shape mattered, and MCP config tests used
echo/truestand-ins that are not portable stdio MCP servers on Windows.This updates the tests to assert deterministic state instead of depending on fragile assistant completions or platform-specific commands:
echofixtures alone.Validation:
mvn verifywas not run locally because this Windows session does not have a JDK installed; Java CI installs JDK 17.