fix(ci): fail loudly on disabled-cache test cache poisoning#23702
Closed
AztecBot wants to merge 1 commit into
Closed
fix(ci): fail loudly on disabled-cache test cache poisoning#23702AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
A 'disabled-cache' test command's cache key is a constant string rather than a content hash, so once a pass was written to the test success cache the command matched on every later run and was skipped permanently (e.g. all TXE tests on merge-train/fairies; PR #23106 logged 709 SKIPPED / 0 run). Stop writing success-cache entries for disabled-cache commands so the cache can no longer be poisoned, and treat any pre-existing poisoned entry as a fatal CI error (purging it) instead of silently skipping. This mirrors cache_content_hash, which already hard-fails in CI when uncommitted/untracked files would otherwise disable the cache.
Collaborator
Author
|
Automatically closing this stale claudebox draft PR (no updates for 5+ days). Re-open if still needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A
disabled-cachetest command's success-cache key is a constant string (hash_str_orig "disabled-cache run_test.sh …"), not a content hash. So oncerun_test_cmdwrote a pass into the success cache (SETEX $key …), every later run hashed the same constant string, hit the cache, and printedSKIPPED— the test never ran again. This is what skipped all TXE tests onmerge-train/fairies: PR #23106 logged 709 SKIPPED / 0 executed (http://ci.aztec-labs.com/b893262917583a30).This is the same class of failure Charlie flagged in #team-fairies: a
disabled-cachemarker should never silently win — it should be loud.Fix — make it fatal in CI, in both cases
There are two ways a
disabled-cachemarker can reach the test cache, and both should be fatal in CI rather than silently disabling a test:ci3/cache_content_hash) — *already_ hard-fails (exit 1) whenCI=1. No change needed.disabled-cachecommand (ci3/run_test_cmd) — previously served as a silentSKIPPED. This PR makes it fatal:disabled-cachecommand, so the cache can no longer be poisoned.exit 1with a clear error instead of skipping.Relationship to #23658
This supersedes the
run_test_cmdchange in #23658. That PR made the poisoned entry a silent bypass (run the test anyway); per the thread, the team wants it to be a fatal error. Same one line, opposite resolution — #23658 should be closed or rebased onto this so the two don't conflict when the train merges up. Everything else in #23658's diagnosis is correct.Rollout note
The poisoned keys are constant and currently live in redis with up to a 7‑day TTL. On the first CI run after this lands, each poisoned
disabled-cachecommand will fatal once and purge its own key; subsequent runs are clean and the tests run normally. If you'd rather avoid the one-time red, flush the affected redis keys at deploy time — happy to provide the key pattern.Testing
bash -n ci3/run_test_cmdpasses. The change is gated onCI=1/USE_TEST_CACHE=1/CI_REDIS_AVAILABLE=1, which aren't reproducible locally without the CI redis; behavior is otherwise unchanged for non-disabled-cachecommands.Created by claudebox · group:
slackbot