-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8373453: C2 SuperWord: must handle load slices that have loads with different memory inputs #29028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Welcome back epeter! A progress list of the required criteria for merging this PR into |
|
@eme64 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 21 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I approve this conservative fix.
TobiHartmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too. Great that we have a regression test for this rare case now.
In
VLoopMemorySlices::find_memory_slices, we analyze the memory slices. In some cases, we only find loads in the slice, and no phi. So the memory input of the loads comes from before the loop. When I refactored the code, I made the assumption that all loads should have the same memory input. After all: any store before the loop would have to have happened before we enter the loop, and execute any loads from the loop. The assumption held for a long time, but now we have a counter example.Summary: one load has its memory input optimized, the other is not put on the IGVN worklist again, and keeps the old memory input (even though in this case we could have optimized it just the same). Thus, both choices of memory input are correct, and the assumption of the assert does not hold.
Solution: Just bail out of auto vectorization if this assumption is violated. This is an edge case, and the assert has not been hit until the fuzzer found this example.
Alternatives: we could track the multiple memory inputs, but that would be more effort to implement, and hard to test because it is difficult to create examples.
Details
Below, look at
1145 LoadBand1131 LoadB. One has memory inputParam 7(initial program state), the other711 Phi(outer loop). Both loads are inside the1147 CountedLoop. But their states come from outside, both originally from711 Phi. But then1145 LoadBis optimized withLoadBNode::Ideal->LoadNode::Ideal->LoadNode::split_through_phi: it realizes that the backedge of the711 Phionly goes by the593 CallStaticJava, which cannot modify theByte::valuefield of theLoadB(unless it was to use reflection, but that unlocks undefined behavior anyway, so it can be ignored). So it is ok to split through the phi, as theByte::valuecannot be modified during the outer loop.1131 LoadBcould also do the same optimization, but it just does not end up on the IGVN worklist. The issue is that we don't have any adequate notification that goes down through theMergeMem - Call - Projstructure. We did not want to have that until now, because in theory we could have a long series of calls, and the traversals could become too expensive.Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/29028/head:pull/29028$ git checkout pull/29028Update a local copy of the PR:
$ git checkout pull/29028$ git pull https://git.openjdk.org/jdk.git pull/29028/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 29028View PR using the GUI difftool:
$ git pr show -t 29028Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/29028.diff
Using Webrev
Link to Webrev Comment