Skip to content

Simplify multigpu dispatch: all devices on pool threads#13340

Merged
Kosinkadink merged 1 commit intoworksplit-multigpufrom
worksplit-multigpu-wip
Apr 9, 2026
Merged

Simplify multigpu dispatch: all devices on pool threads#13340
Kosinkadink merged 1 commit intoworksplit-multigpufrom
worksplit-multigpu-wip

Conversation

@Kosinkadink
Copy link
Copy Markdown
Member

Benchmarked hybrid (main thread + pool) vs all-pool on 2x RTX 4090 with SD1.5 and NetaYume models. No meaningful performance difference (within noise). All-pool is simpler: eliminates the main_device special case, main_batch_tuple deferred execution, and the 3-way branch in the dispatch loop.

Net result: -15 lines.

Benchmarked hybrid (main thread + pool) vs all-pool on 2x RTX 4090
with SD1.5 and NetaYume models. No meaningful performance difference
(within noise). All-pool is simpler: eliminates the main_device
special case, main_batch_tuple deferred execution, and the 3-way
branch in the dispatch loop.

Amp-Thread-ID: https://ampcode.com/threads/T-019d711f-2c57-744c-acf8-6b98ecd7760e
Co-authored-by: Amp <amp@ampcode.com>
@Kosinkadink Kosinkadink force-pushed the worksplit-multigpu-wip branch from 39e9b69 to a95eabd Compare April 9, 2026 08:01
@Kosinkadink Kosinkadink merged commit 48deb15 into worksplit-multigpu Apr 9, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant