[WIP] fix: E2E performance pipeline flakyness and other improvements#60155
[WIP] fix: E2E performance pipeline flakyness and other improvements#60155chrispader wants to merge 14 commits into
Conversation
|
Is this ready? |
|
|
||
| const MAX_CHARACTERS_PER_FILE = 65536; | ||
| const FILE_SIZE_SAFETY_MARGIN = 1000; | ||
| // This is the maximum number of characters allowed to post to a GitHub comment body through the GitHub CLI |
There was a problem hiding this comment.
Are these comments necessary? And the commented out const, can that be removed?
| if ls "./Host_Machine_Files/\$WORKING_DIRECTORY"/output2.md 1> /dev/null 2>&1; then | ||
| # Print all the split files | ||
| for file in "./Host_Machine_Files/\$WORKING_DIRECTORY/output"*; do | ||
| for file in $(ls "./Host_Machine_Files/\$WORKING_DIRECTORY/output"* | sort -V); do |
There was a problem hiding this comment.
is this sorting just for cleaner output?
No, i'm still actively working on it. This is just a draft PR for now. I still have to address the potential "flankyness" of the E2E performance pipelines. |
|
is this still being worked on? |
yes, i'll continue working on this in the next days |
A quick update around my investigations around this issueMost if not all of the recent flanky performance regression reports could be caused by "missed network cache hits". If we look into e.g. the Logcat logs of this AWS DeviceFarm job, we can see a lot of ProblemIn the E2E performance pipeline we store responses from network requests in a cache to mock network requests during the performance measurements, since long-taking requests would distort the measurements. The network cache works by hashing the URL and payload of a request, and later looking up the result from cache by comparing the hashes. In a "warmup" run we actually perform the network requests and store the result in cache. In the actual test run, we then lookup the result from cache by the hash of the new request. This currently does not work for all types of requests, since often times the payload will not fully match the one from the previous request from the warmup run, e.g. with SolutionTo fix this problem, we we're thinking about changing the caching mechanism alltogether, by restoring the initial Onyx state on each iteration and then deterministically fetching the the network request results from the warmup run. This would work as follows:
TODOTo make this work, we would need to work on the following tasks:
What are your thoughts around this? cc @mountiny @hannojg @kirillzyusko |
|
@chrispader since almost everything in the app works offline and we care about the frontend perfomance, could we just test everything offline with the optimistic data once the user signs in? then we could just not care about the cache |
|
Then also we could just import the state with session too and we could skip the authentication completely too |
@mountiny if that's an option and fine with you then yes, let's go for it. This will reduce complexity a lot. I still want to mention, that for some flows and use-cases we might not be able to do E2E performance regression testing then, e.g. things related to chat pagination/scrolling and loading new messages, or maybe extended search with data, that is not stored locally yet. |
|
I think the main issue might be if new collection/ is introduced or changed we would need to update the onyx state in the tests too and that can change the times. But maybe if we find a way to control the state size too, should be ok |
|
@mountiny i will try to wrap this up over the weekend. I was testing the current E2E tests and some of them didn't really work properly in offline mode. E.g. we are showing placeholder views instead of the list in the search screen when the user hasn't been online yet, which cause the tests to hang forever and kind of makes it obsolete. Do you think we should change any of the actual app behavior for the purpose of fixing the E2E performance tests? I feel like this could be a scenario where we would actually want to always show some results in search, rather than just saying "You are offline, there are no results" or sth similar. |
|
I dont think we should be changing that now |
|
After catching up on the discussion here are my thoughts:
Instead of picking one extreme, why not split our tests based on what they actually need? Some can run offline, some need network mocking, and we keep the approach that makes sense for each group |
|
That all makes sense! I think right now though, we should try to take a step back and see where the E2E are and what exactly do we want from them or if they up until now were solving the problems we wanted Going to have to discuss that with the team so please keep this still on hold |
|
i will close this one for now |
@mountiny
Explanation of Change
Fixes issues with the order and number of GH comments (for split up output files) and the flankynes of some performance metrics.
Fixed Issues
$
PROPOSAL:
Tests
Offline tests
QA Steps
// TODO: These must be filled out, or the issue title must include "[No QA]."
PR Author Checklist
### Fixed Issuessection aboveTestssectionOffline stepssectionQA stepssectiontoggleReportand notonIconClick)src/languages/*files and using the translation methodSTYLE.md) were followedAvatar, I verified the components usingAvatarare working as expected)StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))Avataris modified, I verified thatAvataris working as expected in all cases)Designlabel and/or tagged@Expensify/designso the design team can review the changes.ScrollViewcomponent to make it scrollable when more elements are added to the page.mainbranch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTeststeps.Screenshots/Videos
Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop