Skip to content

test(ci): repair merge-queue Integration Tests regressions#1909

Merged
danielmeppiel merged 2 commits into
mainfrom
fix/mq-integration-regressions
Jun 25, 2026
Merged

test(ci): repair merge-queue Integration Tests regressions#1909
danielmeppiel merged 2 commits into
mainfrom
fix/mq-integration-regressions

Conversation

@danielmeppiel

Copy link
Copy Markdown
Collaborator

TL;DR

The Integration Tests workflow (which runs only on merge_group events, not pull_request) was failing on three tests, blocking the merge queue for every PR. All three are test-only defects -- production behaviour is correct. This PR repairs them: two stale policy-discovery E2E assertions that predate cascading policy-repo discovery (#1830), and one brittle plugin E2E assertion that hard-coded a primitive type shipped by a mutable, SAML-protected external plugin.

No production code changes.

Problem (WHY)

Because the Integration Tests job triggers on merge_group, gh pr checks shows green on every PR while the queue itself fails -- so these regressions were invisible at PR-open time and surfaced only when a PR entered the merge queue (e.g. #1872, run 28187496504).

Three failures across two shards:

  1. test_emu_ssh_remote_routes_to_correct_org_policy_repo -- asserted result is sentinel. Cascading discovery (Add cascading policy repo discovery with ADO support #1830) now tries .github -> .apm -> _apm and returns a fresh terminal absent result rather than propagating the per-candidate sentinel object, so the identity assertion no longer holds.
  2. test_ado_v3_ssh_remote_routes_to_correct_org_policy_repo -- mocked _fetch_from_repo, but ADO remotes now route through _fetch_from_ado_repo (Add cascading policy repo discovery with ADO support #1830). The mock was never called (call_count == 0).
  3. test_lockfile_preserved_on_sequential_install -- asserted "agent" in combined.lower() against the live, SAML-protected plugin github/awesome-copilot/plugins/context-engineering. The merge-queue run emitted zero "Cleaned up N integrated ..." lines, meaning the plugin's deployed primitive mix drifted upstream.

Approach (WHAT)

Keep every test's genuine regression-guard intent intact; replace assertions that coupled to implementation details (object identity, a specific fetch helper, a specific external primitive type) with assertions on observable contract.

Implementation (HOW)

Policy tests (tests/integration/test_dep_url_parsing_e2e.py):

Plugin test (tests/integration/test_plugin_e2e.py):

  • Root-caused via a hermetic offline reproduction (the live plugin is unreachable without SAML): a local marketplace-plugin fixture installed, then a local skill installed sequentially, then the plugin uninstalled. Result: the plugin's deployed_files is preserved across the sequential install and uninstall reports Cleaned up 1 integrated agents and deletes the file. The production sequential-install + uninstall-cleanup path is correct; the failure is external-content drift.
  • Added that offline scenario as a permanent, network-free regression test (test_local_plugin_uninstall_after_sequential_skill_install_cleans_deployed_files, gated by requires_apm_binary) so this path is guarded without depending on mutable external content.
  • Hardened the live network test to validate cleanup against the files actually deployed (assert integration cleanup is reported when files were deployed, and that no deployed files remain on disk) instead of hard-coding the word agent.

Trade-offs

  • The network test no longer fails if the upstream plugin changes its primitive mix -- intentional. Robust offline coverage of the same path is added so the contract (sequential install preserves deployed_files; uninstall cleans them) is still strictly enforced in CI.

Validation evidence

  • tests/integration/test_plugin_e2e.py: 10 passed, 11 skipped (network auto-skipped); new offline test passes.
  • tests/integration/test_dep_url_parsing_e2e.py: 14 passed.
  • Lint (CI-mirror, all green): ruff check, ruff format --check, pylint R0801 (10.00/10), scripts/lint-auth-signals.sh.
  • Branch is current with main (0 behind).

How to test

uv run pytest tests/integration/test_dep_url_parsing_e2e.py -q
uv run pytest tests/integration/test_plugin_e2e.py::TestPluginHeroScenarios::test_local_plugin_uninstall_after_sequential_skill_install_cleans_deployed_files -q

The hardened live test runs in the merge_group Integration Tests job where the binary and network are available.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

danielmeppiel and others added 2 commits June 25, 2026 23:01
The merge-queue Integration Tests job (merge_group event) failed on two
stale assertions in test_dep_url_parsing_e2e.py that predate the
cascading policy-repo discovery added in #1830:

- EMU test asserted `result is sentinel`, but cascade discovery
  (.github -> .apm -> _apm) returns a fresh terminal "absent" result
  rather than propagating the per-candidate sentinel object. The #1159
  regression guard is the ROUTING (first candidate == contoso/.github),
  which is preserved; the identity assertion is replaced with an
  outcome assertion.
- ADO test mocked `_fetch_from_repo`, but ADO remotes now route through
  `_fetch_from_ado_repo` (#1830), so the mock was never called
  (call_count == 0). Re-point the patch at `_fetch_from_ado_repo` and
  assert the real org (`realorg`, not `v3`) is passed as a kwarg.

Production discovery is correct; these are test-only fixes that keep the
#1159 SCP-regex routing regression guard intact while being
cascade-compatible.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Offline local plugin repro shows sequential installs preserve deployed_files and uninstall cleans deployed agent, prompt, and skill files. The live SAML-protected plugin can change its primitive mix, so the network E2E now validates cleanup against files actually deployed instead of requiring an agent cleanup line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 25, 2026 21:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Repairs regressions in the merge-queue-only Integration Tests workflow by updating brittle E2E assertions to match the current observable contracts (post-#1830 cascading policy discovery and plugin uninstall cleanup behavior), without changing any production code.

Changes:

  • Updates EMU/ADO policy-discovery E2E tests to assert terminal outcomes/inputs instead of implementation details (object identity, outdated fetch helper).
  • Hardens the live plugin sequential-install/uninstall E2E test to validate cleanup against actually deployed files rather than a hard-coded primitive type.
  • Adds a new offline, network-free regression test that exercises sequential install + uninstall cleanup using the local mock plugin fixture.
Show a summary per file
File Description
tests/integration/test_plugin_e2e.py Adds local sequential install/uninstall cleanup regression test and hardens the live network test to assert cleanup based on deployed files.
tests/integration/test_dep_url_parsing_e2e.py Updates EMU/ADO routing assertions to align with cascading policy repo discovery and ADO-specific fetch path.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@danielmeppiel danielmeppiel merged commit f1c0918 into main Jun 25, 2026
14 checks passed
@danielmeppiel danielmeppiel deleted the fix/mq-integration-regressions branch June 25, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants