Skip to content

[ML] Allow on-the-fly adjustment of the number of threads in use#2232

Merged
tveasey merged 6 commits intoelastic:mainfrom
tveasey:adjustable-thread-pool-size
Mar 18, 2022
Merged

[ML] Allow on-the-fly adjustment of the number of threads in use#2232
tveasey merged 6 commits intoelastic:mainfrom
tveasey:adjustable-thread-pool-size

Conversation

@tveasey
Copy link
Copy Markdown
Contributor

@tveasey tveasey commented Mar 17, 2022

For PyTorch model inference we want to be able to control the number of threads we assign to parallel calls to forward on-the-fly to adjust the number of cores each model consumes on a node as cluster wide tasks are added or removed. This lays the groundwork by adding a dynamic setting for the number of threads the pool will use. The idea is we always start the process thread pool with the hardware concurrency (divided by the number of threads we give libtorch), but then adjust the threads actually used by control message. Unused threads will simply be idle waiting to pop an empty queue so are effectively free.

cc @dimitris-athanasiou.

Copy link
Copy Markdown

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey tveasey merged commit f73a706 into elastic:main Mar 18, 2022
@tveasey tveasey deleted the adjustable-thread-pool-size branch March 18, 2022 12:12
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Feb 27, 2026
Allows overriding the PR number from the command line, useful for
local testing of the GitHub comment feature without being in a
Buildkite PR build environment.

Tested end-to-end against build elastic#2232 (Bayesian test timeout),
posting to a throwaway PR. Both initial post and update-in-place
(deduplication) verified working.

Made-with: Cursor
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 20, 2026
Allows overriding the PR number from the command line, useful for
local testing of the GitHub comment feature without being in a
Buildkite PR build environment.

Tested end-to-end against build elastic#2232 (Bayesian test timeout),
posting to a throwaway PR. Both initial post and update-in-place
(deduplication) verified working.

Made-with: Cursor
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 24, 2026
Allows overriding the PR number from the command line, useful for
local testing of the GitHub comment feature without being in a
Buildkite PR build environment.

Tested end-to-end against build elastic#2232 (Bayesian test timeout),
posting to a throwaway PR. Both initial post and update-in-place
(deduplication) verified working.

Made-with: Cursor
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 26, 2026
Allows overriding the PR number from the command line, useful for
local testing of the GitHub comment feature without being in a
Buildkite PR build environment.

Tested end-to-end against build elastic#2232 (Bayesian test timeout),
posting to a throwaway PR. Both initial post and update-in-place
(deduplication) verified working.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants