Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2) by Kosinkadink · Pull Request #13478 · Comfy-Org/ComfyUI

Kosinkadink · 2026-04-20T07:30:49Z

Problem

Hunyuan 3D 2.1 throws ValueError: not enough values to unpack (expected 2, got 1) when using multi-GPU worksplit (MultiGPU CFG Split node).

The model's forward() method hardcodes context.chunk(2, dim=0) assuming both cond and uncond are always batched together. In multi-GPU worksplit mode, a device may only receive one of them, so chunk(2) on a batch of 1 returns only 1 chunk.

Fix

Use transformer_options["cond_or_uncond"] to detect whether both cond and uncond are present in the batch. Only perform the half-swap when both are present (len == 2 and set == {0, 1}). When only one is present, skip the reordering entirely (identity operation).

Changes

comfy/ldm/hunyuan3dv2_1/hunyuandit.py: Replace hardcoded chunk(2) with conditional swap gated on cond_or_uncond

Performance (dual RTX 4090)

Sampling-only speed (isolating the accelerated portion)

Resolution	Single GPU	Multi GPU	Sampling Speedup
4096	3.58 it/s	6.38 it/s	1.78x
8192	1.62 it/s	3.10 it/s	1.91x

End-to-end wall time (includes non-accelerated VAE decode + mesh generation)

Resolution	Single GPU	Multi GPU	Overall Speedup
4096	22.79s	19.22s	1.19x
8192	35.76s	28.02s	1.28x

Sampling-only speedup is near-theoretical 2x for CFG split across 2 GPUs. Overall speedup is lower because VAE decode and mesh generation are not accelerated by worksplit.

Benchmarks: 2 warmup runs + 5 timed runs, batch_size=1, 30 steps, euler sampler.

… hardcoded chunk(2) Amp-Thread-ID: https://ampcode.com/threads/T-019da964-2cc8-77f9-9aae-23f65da233db Co-authored-by: Amp <amp@ampcode.com>

Kosinkadink requested review from comfyanonymous and guill as code owners April 20, 2026 07:30

Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of…

4b8fa98

… hardcoded chunk(2) Amp-Thread-ID: https://ampcode.com/threads/T-019da964-2cc8-77f9-9aae-23f65da233db Co-authored-by: Amp <amp@ampcode.com>

Kosinkadink force-pushed the worksplit-multigpu-wip branch from 3e7c01f to 4b8fa98 Compare April 20, 2026 09:12

Kosinkadink merged commit 37deccb into worksplit-multigpu Apr 20, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2)#13478

Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2)#13478
Kosinkadink merged 1 commit intoworksplit-multigpufrom
worksplit-multigpu-wip

Kosinkadink commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kosinkadink commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Changes

Performance (dual RTX 4090)

Sampling-only speed (isolating the accelerated portion)

End-to-end wall time (includes non-accelerated VAE decode + mesh generation)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Kosinkadink commented Apr 20, 2026 •

edited

Loading