Wait for running state before deleting in lifecycle E2E test#4745
Conversation
The workload lifecycle test intermittently times out (12/147 runs on main, 8.2% failure rate) because it deletes the workload before the proxy-runner finishes setting status to "running". The proxy-runner then overwrites the "removing" status back to "running", causing the delete verification to never complete. Add a "wait for running" poll between the list verification and delete steps, matching the pattern used by the stable "delete running workload" test. This closes the race window between proxy-runner startup and deletion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a /deflake skill that analyzes GitHub Actions failures on main to discover, rank, and plan fixes for flaky tests. The skill includes a Python collection script that deterministically fetches failed run logs in parallel, extracts test names from Ginkgo and gotestfmt output, and aggregates failures into a ranked report. Used this skill to identify and fix the #1 flake (workload lifecycle E2E test, 12/147 runs) in #4745. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4745 +/- ##
==========================================
+ Coverage 68.75% 68.76% +0.01%
==========================================
Files 516 516
Lines 54229 54307 +78
==========================================
+ Hits 37283 37343 +60
- Misses 14099 14110 +11
- Partials 2847 2854 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
LGTM — the added poll correctly gates deletion on WorkloadStatusRunning, directly fixing the documented 8.2% flake.
Extended reasoning...
Overview
Single-file change to test/e2e/api_workloads_test.go: adds 12 lines inserting a "Step 3.5" Eventually poll that waits for WorkloadStatusRunning before the delete step in the lifecycle test.
Security risks
None — this is an E2E test file with no production code changes.
Level of scrutiny
Low. The change is mechanical, isolated to a test, and follows the exact same polling pattern used elsewhere in the same file (e.g., the "should successfully delete stopped workload" test). The fix is a textbook solution to the described race condition.
Other factors
The inline bug comment flags a pre-existing issue in a different test ("should successfully delete running workload") that was not introduced by this PR and is out of scope for this fix. The PR description already acknowledges the underlying production race as a separate tracked issue. The fix itself is correct and sufficient for its stated goal.
* Add deflake skill for finding and fixing flaky tests Adds a /deflake skill that analyzes GitHub Actions failures on main to discover, rank, and plan fixes for flaky tests. The skill includes a Python collection script that deterministically fetches failed run logs in parallel, extracts test names from Ginkgo and gotestfmt output, and aggregates failures into a ranked report. Used this skill to identify and fix the #1 flake (workload lifecycle E2E test, 12/147 runs) in #4745. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address review feedback on collect-flakes script - Extract per-test log context (50 lines before the failure marker) before classifying failure mode, so tests in the same run get accurate individual mode labels instead of all inheriting the first match from the full run log - Add try/except around future.result() so one failed run doesn't crash the script and lose all collected data - Fix misleading comment about MAX_PAGES covering 300 Main build runs — the API returns all workflows' runs, not just Main build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix per-test failure mode extraction in collect-flakes The previous attempt to extract per-test failure context used a 50-line window before the [FAIL] summary line, but Ginkgo's [FAILED] reason line (e.g., "Timed out after 120s") can appear thousands of lines earlier. Also needed ANSI stripping when searching for [FAILED] markers. Now searches backwards from the [FAIL] summary to find all [FAILED] lines in the failure block, uses the earliest one (which has the actual failure reason), and extracts context spanning all of them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
should track workload through create-list-delete lifecycle) intermittently times out on main — 12/147 runs (8.2% failure rate), always at 120sType of change
Test plan
task lint-fix)Verified the fix compiles. The flake requires CI to reproduce — this PR should reduce the
api-workloadsE2E failure rate from ~8% to near zero for this test.Special notes for reviewers
The underlying production race (proxy-runner can overwrite "removing" status with "running" during concurrent delete + startup) is a separate issue worth tracking. This PR only fixes the test-side trigger.
Generated with Claude Code