-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8357258: x86: Improve receiver type profiling reliability #25305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8357258: x86: Improve receiver type profiling reliability #25305
Conversation
|
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
|
@shipilev This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
3a49d15 to
798b4c3
Compare
d2733a3 to
d5b7991
Compare
|
@shipilev This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply issue a |
|
@shipilev this pull request can not be integrated into git checkout JDK-8357258-x86-c1-optimize-virt-calls
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
d5b7991 to
3b601fc
Compare
|
In addition to reliability improvements, doing a denser loop allows to significantly optimize tier3 code density. With larger |
|
Looking for reviews! @dean-long, @vnkozlov, @veresov -- you would probably be interested in this. |
|
@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
/touch Not now, bot, still looking for reviewers. |
|
@shipilev The pull request is being re-evaluated and the inactivity timeout has been reset. |
|
@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
Shoo bots, still looking for reviewers. |
|
Oh, all right! This made me realize we actually have a secondary "fast" case: receiver is not found, but profile is full. This is pretty frequent with We now lose "only" 0.5ns in this test: The code is in new commits, passes |
|
This looks good. Thank you for cleaning up code and detailed comments. |
|
Overall, looks good to me. Nice work, Aleksey! I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? For example:
[1] |
Yes, since
|
I don't think we need to optimize |
|
My testing of version 07 passed clean |
This is about 5-th-ish version of this code, so I don't think there is more juice to squeeze out of it. The current version is more or less optimal. The stratification into three cases looks the best performing overall.
Yeah, but putting checks for both installed receiver and nullptr slot turns out hurting performance; this is bad even without extra control flow. Two separate loops are more efficient, even for small number of iterations. It also helpfully optimizes for the best case, when profile is smaller than
I don't think it is worth the extra complexity, honestly. The loop-y code in current version is still a significant code density win over the decision-tree (unrolled, effectively) approach we are doing currently. Keeping this thing simple means more reliability and less testing surface, plus much less headache to port to other architectures. Note that the goal for this work is to improve profiling reliability without hopefully ceding too much ground in code density and performance. When I started out, it was not clear if it is doable, given the need for atomics; but it looks doable indeed. So I think we should call this thing done and move on to solving the actual performance problem in this code: the contention on counter updates. |
|
Thanks for the clarifications, Aleksey. Just wanted to get a sense how much performance-wise we leave on the table and whether it is worth to spend more time on it later. |
iwanowww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can look on this later if we need to optimize it more. Thankfully it is in one place now.
I don't need to retest it since you didn't change code after v07 and only merged from mainline.
|
Hi @shipilev , are you aware of anyone working on or planning to implement the same for AArch64 by any chance? |
I'll task one of our folks to do it after NY break. Speaking of, I will integrate this one after NY break as well, to avoid dealing with any possible fallout during the holidays. |
|
Remerged from master, re-ran |
|
Here goes. /integrate |
That would be: https://bugs.openjdk.org/browse/JDK-8374513 |
See the bug for discussion what issues current machinery has.
This PR executes the plan outlined in the bug:
C1OptimizeVirtualCallProfilingto only claim slots when receiver is installedThis PR does not do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral.
Additional testing:
compiler/allProgress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305$ git checkout pull/25305Update a local copy of the PR:
$ git checkout pull/25305$ git pull https://git.openjdk.org/jdk.git pull/25305/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25305View PR using the GUI difftool:
$ git pr show -t 25305Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25305.diff
Using Webrev
Link to Webrev Comment