Skip to content

test: deflake integration tests by polling instead of fixed sleeps#844

Open
vdusek wants to merge 6 commits into
masterfrom
test/fix-flaky-unlock-requests
Open

test: deflake integration tests by polling instead of fixed sleeps#844
vdusek wants to merge 6 commits into
masterfrom
test/fix-flaky-unlock-requests

Conversation

@vdusek
Copy link
Copy Markdown
Contributor

@vdusek vdusek commented Jun 2, 2026

Summary

Started as a fix for this flaky CI run of test_request_queue_unlock_requests[sync] (assert 2 == 3 on unlocked_count, caused by replication lag between list_and_lock_head and unlock_requests), and grew into deflaking the integration test suite as a whole.

Changes:

  • Add a poll_until_condition helper to tests/integration/_utils.py: polls a sync-or-async callable at a constant interval until a condition holds or a wall-clock timeout expires. An optional backoff_factor multiplies the interval after each poll, for highly variable waits (e.g. Actor run container startup) where a growing delay covers a long timeout with few calls.
  • Fix the flaky unlock test by polling list_head until the locked IDs disappear from the queue head before unlocking.
  • Replace all hand-rolled for _ in range(5): sleep(1); read; break polling loops (10×) and single fixed sleep(1) waits (26×) across the request queue, dataset, key-value store, and run integration tests with poll_until_condition. This makes the tests both faster on the happy path (no unconditional sleep) and more robust under load (polls until the timeout instead of hoping 1 s is enough).
  • Generalize maybe_await to accept any awaitable.

The three iterate_keys sleeps in the KVS tests are intentionally left as-is: draining an iterator per attempt wants attempt-count semantics (like collect_iterate_until_present), not a wall-clock deadline.

Follow-up to #786.

@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Jun 2, 2026
@vdusek vdusek self-assigned this Jun 2, 2026
@github-actions github-actions Bot added this to the 142nd sprint - Tooling team milestone Jun 2, 2026
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label Jun 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.82%. Comparing base (d2ab2ae) to head (c856ab1).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #844      +/-   ##
==========================================
+ Coverage   94.68%   94.82%   +0.13%     
==========================================
  Files          48       48              
  Lines        5045     5045              
==========================================
+ Hits         4777     4784       +7     
+ Misses        268      261       -7     
Flag Coverage Δ
integration 93.24% <ø> (+0.23%) ⬆️
unit 83.52% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vdusek vdusek changed the base branch from master to test/unify-polling-helpers June 2, 2026 13:44
@vdusek vdusek changed the base branch from test/unify-polling-helpers to master June 2, 2026 13:53
@vdusek vdusek force-pushed the test/fix-flaky-unlock-requests branch from 44e03e0 to ad43eec Compare June 2, 2026 13:53
vdusek added 2 commits June 3, 2026 12:05
Add a wall-clock-deadline `poll_until_condition` helper, generalize `maybe_await` to any awaitable, and refactor `collect_iterate_until_present` to reuse a shared drain step.
Lock writes propagate asynchronously after `list_and_lock_head` returns, so unlocking immediately could see fewer locks than acquired; poll `list_head` until the locked IDs disappear from the queue head before unlocking.
@vdusek vdusek force-pushed the test/fix-flaky-unlock-requests branch from 1cb554e to e23e54d Compare June 3, 2026 10:06
@vdusek vdusek changed the title test: fix flaky test_request_queue_unlock_requests test: deflake integration tests by polling instead of fixed sleeps Jun 3, 2026
@vdusek vdusek requested a review from janbuchar June 3, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants