-
Notifications
You must be signed in to change notification settings - Fork 6.2k
[Linux]: faster getCurrentThreadUserTime() #29032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Linux]: faster getCurrentThreadUserTime() #29032
Conversation
before: Benchmark Mode Cnt Score Error Units ThreadMXBeanBench.getCurrentThreadUserTime sample 4347067 81.746 ± 0.510 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.00 sample 69.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.50 sample 80.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.90 sample 90.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.95 sample 90.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.99 sample 90.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.999 sample 230.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.9999 sample 1980.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p1.00 sample 653312.000 ns/op after: Benchmark Mode Cnt Score Error Units ThreadMXBeanBench.getCurrentThreadUserTime sample 5081223 70.813 ± 0.325 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.00 sample 59.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.50 sample 70.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.90 sample 70.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.95 sample 70.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.99 sample 80.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.999 sample 170.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p0.9999 sample 1830.000 ns/op ThreadMXBeanBench.getCurrentThreadUserTime:p1.00 sample 425472.000 ns/op
|
Hi @jerrinot, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user jerrinot" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
|
❗ This change is not yet ready to be integrated. |
|
Hi @jerrinot, thanks for making a comment in an OpenJDK project! All comments and discussions in the OpenJDK Community must be made available under the OpenJDK Terms of Use. If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please Use "Add GitHub user jerrinot" for the summary. If you are not an OpenJDK Author, Committer or Reviewer, simply check the box below to accept the OpenJDK Terms of Use for your comments.
Your comment will be automatically restored once you have accepted the OpenJDK Terms of Use. |
This PR further optimizes os::current_thread_cpu_time (used by ThreadMXBean.getCurrentThreadUserTime()) on Linux by using the kernel's fast-path for the calling thread. I see it as a continuation of #28556.
The current code in master has 2 phases:
CPUCLOCK_SCHEDtimer toCPUCLOCK_VIRTclock_gettime()Within the kernel, a routine exists to extract the TID from the timer, followed by a radix lookup to find booking structures for the specified TID.
However, the kernel also has a fast-path: When TID is
0, then the kernel avoids the radix lookup and instead treats the timer as bound to 'the current thread' (or a process, depending on lower bits of the clock). The current thread/task has bookkeeping structures more readily available and the radix lookup is avoided. The result is around 13% fastergetCurrentThreadUserTime().Potential concerns
TIDis 0 would break user-space.clockid_t. Feedback here is very much appreciated.The benchmark from #28556 (switched to nanos + more iterations + fork count):
@JonasNorlinder: this is what I raised on the mailing-list.
Progress
Error
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/29032/head:pull/29032$ git checkout pull/29032Update a local copy of the PR:
$ git checkout pull/29032$ git pull https://git.openjdk.org/jdk.git pull/29032/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 29032View PR using the GUI difftool:
$ git pr show -t 29032Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/29032.diff