test(ci): repair merge-queue Integration Tests regressions#1909
Merged
Conversation
The merge-queue Integration Tests job (merge_group event) failed on two stale assertions in test_dep_url_parsing_e2e.py that predate the cascading policy-repo discovery added in #1830: - EMU test asserted `result is sentinel`, but cascade discovery (.github -> .apm -> _apm) returns a fresh terminal "absent" result rather than propagating the per-candidate sentinel object. The #1159 regression guard is the ROUTING (first candidate == contoso/.github), which is preserved; the identity assertion is replaced with an outcome assertion. - ADO test mocked `_fetch_from_repo`, but ADO remotes now route through `_fetch_from_ado_repo` (#1830), so the mock was never called (call_count == 0). Re-point the patch at `_fetch_from_ado_repo` and assert the real org (`realorg`, not `v3`) is passed as a kwarg. Production discovery is correct; these are test-only fixes that keep the #1159 SCP-regex routing regression guard intact while being cascade-compatible. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Offline local plugin repro shows sequential installs preserve deployed_files and uninstall cleans deployed agent, prompt, and skill files. The live SAML-protected plugin can change its primitive mix, so the network E2E now validates cleanup against files actually deployed instead of requiring an agent cleanup line. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Repairs regressions in the merge-queue-only Integration Tests workflow by updating brittle E2E assertions to match the current observable contracts (post-#1830 cascading policy discovery and plugin uninstall cleanup behavior), without changing any production code.
Changes:
- Updates EMU/ADO policy-discovery E2E tests to assert terminal outcomes/inputs instead of implementation details (object identity, outdated fetch helper).
- Hardens the live plugin sequential-install/uninstall E2E test to validate cleanup against actually deployed files rather than a hard-coded primitive type.
- Adds a new offline, network-free regression test that exercises sequential install + uninstall cleanup using the local mock plugin fixture.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/test_plugin_e2e.py | Adds local sequential install/uninstall cleanup regression test and hardens the live network test to assert cleanup based on deployed files. |
| tests/integration/test_dep_url_parsing_e2e.py | Updates EMU/ADO routing assertions to align with cascading policy repo discovery and ADO-specific fetch path. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
The
Integration Testsworkflow (which runs only onmerge_groupevents, notpull_request) was failing on three tests, blocking the merge queue for every PR. All three are test-only defects -- production behaviour is correct. This PR repairs them: two stale policy-discovery E2E assertions that predate cascading policy-repo discovery (#1830), and one brittle plugin E2E assertion that hard-coded a primitive type shipped by a mutable, SAML-protected external plugin.No production code changes.
Problem (WHY)
Because the
Integration Testsjob triggers onmerge_group,gh pr checksshows green on every PR while the queue itself fails -- so these regressions were invisible at PR-open time and surfaced only when a PR entered the merge queue (e.g. #1872, run 28187496504).Three failures across two shards:
test_emu_ssh_remote_routes_to_correct_org_policy_repo-- assertedresult is sentinel. Cascading discovery (Add cascading policy repo discovery with ADO support #1830) now tries.github->.apm->_apmand returns a fresh terminalabsentresult rather than propagating the per-candidate sentinel object, so the identity assertion no longer holds.test_ado_v3_ssh_remote_routes_to_correct_org_policy_repo-- mocked_fetch_from_repo, but ADO remotes now route through_fetch_from_ado_repo(Add cascading policy repo discovery with ADO support #1830). The mock was never called (call_count == 0).test_lockfile_preserved_on_sequential_install-- asserted"agent" in combined.lower()against the live, SAML-protected plugingithub/awesome-copilot/plugins/context-engineering. The merge-queue run emitted zero "Cleaned up N integrated ..." lines, meaning the plugin's deployed primitive mix drifted upstream.Approach (WHAT)
Keep every test's genuine regression-guard intent intact; replace assertions that coupled to implementation details (object identity, a specific fetch helper, a specific external primitive type) with assertions on observable contract.
Implementation (HOW)
Policy tests (
tests/integration/test_dep_url_parsing_e2e.py):contoso/.github); replaceresult is sentinelwithresult.outcome == "absent"(cascade-correct terminal outcome)._fetch_from_ado_repoand assert the real org (realorg, notv3) is passed as a kwarg.Plugin test (
tests/integration/test_plugin_e2e.py):deployed_filesis preserved across the sequential install and uninstall reportsCleaned up 1 integrated agentsand deletes the file. The production sequential-install + uninstall-cleanup path is correct; the failure is external-content drift.test_local_plugin_uninstall_after_sequential_skill_install_cleans_deployed_files, gated byrequires_apm_binary) so this path is guarded without depending on mutable external content.agent.Trade-offs
Validation evidence
tests/integration/test_plugin_e2e.py: 10 passed, 11 skipped (network auto-skipped); new offline test passes.tests/integration/test_dep_url_parsing_e2e.py: 14 passed.ruff check,ruff format --check,pylint R0801(10.00/10),scripts/lint-auth-signals.sh.main(0 behind).How to test
The hardened live test runs in the
merge_groupIntegration Tests job where the binary and network are available.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com