fix: prune stale container pins and move UpdateContainerPins to after compilation#39770
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Add PruneStaleContainerPins to ActionCache to remove entries no longer referenced by compiled lock files - Update UpdateContainerPins to prune stale entries and save when only pruning occurred (not just when new pins were added) - Move UpdateContainerPins to after compilation in upgrade and update commands so it reads fresh lock files with the current AWF version - Add tests for PruneStaleContainerPins and the pruning flow Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes container digest pin drift by pruning stale entries from actions-lock.json and reordering when pins are updated so the pin updater reads the post-compile .lock.yml image set.
Changes:
- Added
ActionCache.PruneStaleContainerPinsto evict container pins not referenced by current compiled lock files. - Updated container pin update flow to prune stale pins and persist the cache even when only pruning occurred.
- Moved
UpdateContainerPinsto run after compilation in bothupgradeandupdateflows.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/action_cache.go | Adds stale container-pin pruning helper on the cache. |
| pkg/workflow/action_cache_container_pin_test.go | Adds unit tests covering prune behavior (stale/all/none/nil). |
| pkg/cli/update_container_pins.go | Prunes stale container pins based on lock-file image set and saves on prune-only updates. |
| pkg/cli/update_container_pins_test.go | Adds a test exercising pruning behavior using a real cache file on disk. |
| pkg/cli/upgrade_command.go | Moves container pin updates to after compilation in upgrade. |
| pkg/cli/update_command.go | Moves container pin updates to after workflow update/compile in update. |
| .github/workflows/test-quality-sentinel.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/smoke-copilot.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/smoke-copilot-arm.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/smoke-copilot-aoai-entra.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/smoke-copilot-aoai-apikey.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/smoke-claude.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/security-review.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/refiner.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/pr-triage-agent.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/pr-nitpick-reviewer.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/pr-code-quality-reviewer.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/mattpocock-skills-reviewer.lock.yml | Regenerated lock content (schema key change). |
| .github/workflows/grumpy-reviewer.lock.yml | Regenerated lock content (schema key change). |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 19/19 changed files
- Comments generated: 1
| // Build a set of images currently referenced in the compiled lock files so | ||
| // that stale entries (e.g. superseded AWF versions) can be pruned. | ||
| imageSet := make(map[string]bool, len(images)) | ||
| for _, img := range images { | ||
| imageSet[img] = true | ||
| } |
|
🧠 Matt Pocock Skills Reviewer failed to deliver outputs during the skills-based review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
✅ Test Quality Sentinel completed test quality analysis. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
🏗️ Design Decision Gate — ADR RequiredThis PR makes significant changes to core business logic (213 new lines across
📄 Draft ADR — copy into
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose, /tdd, and /grill-with-docs — requesting changes on one correctness gap and a test coverage issue.
📋 Key Themes & Highlights
Correctness
- Early-exit skips prune (
update_container_pins.go:73-77): whencollectImagesFromLockFilesreturns an empty slice,PruneStaleContainerPinsis never called. Stale pins accumulate indefinitely if all container workflows are removed. The prune step should run with an emptyimageSetbefore the early return.
Test Coverage
- Integration test bypasses
UpdateContainerPins:TestUpdateContainerPins_PrunesStaleEntriesmanually re-implements the cache-load + prune wiring from insideUpdateContainerPinsrather than calling the function. Future internal refactors won't be caught. - Missing edge-case test: no test for "empty lock files, non-empty stale cache" — the exact scenario the early-exit bug affects.
- No regression guard for step ordering: moving
UpdateContainerPinsto after compilation is the primary fix, but nothing would fail if the steps were accidentally re-ordered.
Documentation
- Unexplained
.lock.ymlschema changes: 13 files changeoptionalPositiveInteger→issueOrPRNumber, which is unrelated to container pins and not mentioned in the PR description.
Positive Highlights
- ✅ Root cause is clearly identified and symmetrically fixed in both
upgradeandupdatecommand paths - ✅
PruneStaleContainerPinsis clean: nil-safe, setsdirtyflag correctly, logs at the right level - ✅ Save-trigger fix (
len(updatedImages) > 0 || prunedCount > 0) is exactly right - ✅
TestPruneStaleContainerPins_NoneStaleverifiesdirtyis not set when nothing changes — a subtle but important guard - ✅ Four unit tests cover the full edge-case matrix for the new function
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer
Comments that could not be inline-anchored
pkg/cli/update_container_pins.go:77
[/diagnose] The len(images) == 0 early return skips PruneStaleContainerPins entirely. If a user removes all container-based workflows, collectImagesFromLockFiles returns an empty slice, the function exits here, and stale pins in actions-lock.json are never cleaned up.
<details>
<summary>💡 Suggested fix</summary>
Load the cache and run prune before the early exit, passing an empty imageSet when no images are found:
// Load cache + always prune (empty set removes all sta…
</details>
<details><summary>pkg/cli/update_container_pins_test.go:220</summary>
**[/tdd]** This test manually re-implements the `collectImagesFromLockFiles → imageSet → PruneStaleContainerPins` wiring that lives *inside* `UpdateContainerPins` rather than calling the function itself. If that internal wiring changes, this test stays green even though the real code path is broken.
<details>
<summary>💡 Suggestion</summary>
The Docker constraint is real, but consider extracting the prune-and-save logic into a helper that accepts a pre-loaded cache, making the real code path …
</details>
<details><summary>pkg/cli/upgrade_command.go:332</summary>
**[/diagnose]** The step ordering — `UpdateContainerPins` running after compilation — is the core behavioral fix in this PR, but there's no automated guard that enforces it. If Step 4b were accidentally moved back above Step 4, the bug would silently return with no failing test.
<details>
<summary>💡 Suggestion</summary>
Even a lightweight comment noting the ordering constraint is load-bearing helps reviewers:
```go
// Step 4b: MUST run after Step 4 (compilation).
// Compilation rewrites .lo…
</details>
<details><summary>pkg/cli/update_container_pins_test.go:176</summary>
**[/tdd]** Missing edge case: "no images in lock files, but stale pins in cache." Due to the early exit at line 73–77, `PruneStaleContainerPins` is never called in this scenario, so stale pins persist. A failing test here both documents the expected behaviour (pins should be cleared) and acts as a regression guard once the early-exit issue is fixed.
<details>
<summary>💡 Suggested test skeleton</summary>
```go
func TestPruneStillRunsWhenNoLockFiles(t *testing.T) {
// actions-lock.json has…
</details>
<details><summary>.github/workflows/grumpy-reviewer.lock.yml:745</summary>
**[/grill-with-docs]** These 13 `.lock.yml` changes (`optionalPositiveInteger` → `issueOrPRNumber` on `pull_request_number` parameters) appear to be a separate schema evolution that was bundled in because lock files were regenerated while testing this fix. They're not mentioned in the PR description and could confuse reviewers who are trying to understand the container pin scope.
If these are intentional (e.g., to support `#aw_` temporary ID references on `pull_request_number`), a single sente…
</details>
<details><summary>pkg/workflow/action_cache_container_pin_test.go:198</summary>
**[/tdd]** ✅ Setting `cache.dirty = false` before the call and then asserting `assert.False(t, cache.dirty, ...)` is a great touch — it explicitly guards against spurious saves when nothing changes. Easy to miss, important to have.
</details>
|
|
@copilot run pr-finisher skill |
Ran a full pr-finisher pass on this branch (HEAD |
gh aw upgrade/updatere-pinned digests for whatever container versions were already inactions-lock.jsonrather than the versionscompileactually emits. After compilation bumped the bundled AWF version (e.g.0.27.0→0.27.2),actions-lock.jsonretained the old entries indefinitely — no command reconciled the two.Root cause
UpdateContainerPinsran before compilation in bothupgradeandupdate. By the timecompilewrote new AWF versions into the.lock.ymlfiles, the pin-update step had already finished against the old lock files.Changes
pkg/workflow/action_cache.go— addPruneStaleContainerPins(knownImages map[string]bool) int: removesContainerPinsentries whose keys are absent from the current lock-file image set; returns the pruned count.pkg/cli/update_container_pins.go— before resolving new digests, build a set from the collected lock-file images and callPruneStaleContainerPinsto evict superseded entries (e.g. old AWF tags). Also save the cache when entries were only pruned, not just when new pins were added.pkg/cli/upgrade_command.go— moveUpdateContainerPinsfrom before compilation (old Step 3b) to after compilation (new Step 4b), so it reads lock files that already carry the current AWF version.pkg/cli/update_command.go— moveUpdateContainerPinsto afterUpdateActionsInWorkflowFiles(which drives compilation), for the same reason.After this change, the
containersblock inactions-lock.jsonalways reflects exactly the image tags present in the compiled.lock.ymlfiles — stale versions are pruned, new versions are pinned.