Skip to content

WIP: Increase runs/warmupRuns for flaky Reassure perf tests#744

Closed
abzokhattab wants to merge 1 commit into
Expensify:mainfrom
abzokhattab:abzokhattab/increase-reassure-runs-for-flaky-tests
Closed

WIP: Increase runs/warmupRuns for flaky Reassure perf tests#744
abzokhattab wants to merge 1 commit into
Expensify:mainfrom
abzokhattab:abzokhattab/increase-reassure-runs-for-flaky-tests

Conversation

@abzokhattab

@abzokhattab abzokhattab commented Mar 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #741. $ Expensify/App#80320

The previous PR raised stability check thresholds from 10ms/20% to 20ms/40%, which helped but wasn't sufficient on its own. The delta check (base vs PR-head, 10ms/20%) is also affected by CI noise, and we don't want to loosen those thresholds since they catch real regressions.

This PR takes a different approach — increasing the number of measurement samples for the most flaky tests to reduce variance through statistical averaging:

Tests updated with runs: 30, warmupRuns: 3 (default is runs: 10, warmupRuns: 1):

  • keyChangedAdd initial code and dependencies #1 most flaky test (33 failures in analysis)
  • fireCallbacksUpdate IONKEYS to ONYXKEYS in Readme #2 most flaky test (20 failures)
  • scheduleSubscriberUpdate — (7 failures)
  • doAllCollectionItemsBelongToSameParent — low baseline (~1.2ms), extreme relative deviations
  • isValidNonEmptyCollectionForMerge — low baseline, extreme relative deviations

This improves signal quality for both the stability check and the delta check without loosening any thresholds.

Bonus fix: Boolean('false') evaluates to true in JS (non-empty string), causing the delta check to print the wrong error message ("during stability checks" instead of the correct message). Fixed by using === 'true' comparison.

Test plan

  • CI Reassure perf tests pass on this PR
  • Monitor subsequent PRs to verify reduced flakiness rate

…) bug

Add runs: 30 and warmupRuns: 3 to the top flaky perf tests
(keyChanged, fireCallbacks, scheduleSubscriberUpdate,
doAllCollectionItemsBelongToSameParent, isValidNonEmptyCollectionForMerge)
to reduce CI noise through statistical averaging.

Also fix Boolean('false') evaluating to true in validateReassureOutput,
which caused the delta check to print the wrong error message.
@abzokhattab abzokhattab requested a review from a team as a code owner March 1, 2026 22:37
@melvin-bot melvin-bot Bot requested review from luacmartins and removed request for a team March 1, 2026 22:37
@abzokhattab abzokhattab marked this pull request as draft March 1, 2026 22:38
@abzokhattab abzokhattab changed the title Increase runs/warmupRuns for flaky Reassure perf tests WIP: Increase runs/warmupRuns for flaky Reassure perf tests Mar 1, 2026
@abzokhattab abzokhattab closed this Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant