Skip to content

Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2)#13478

Merged
Kosinkadink merged 1 commit intoworksplit-multigpufrom
worksplit-multigpu-wip
Apr 20, 2026
Merged

Fix Hunyuan 3D 2.1 multi-GPU worksplit: use cond_or_uncond instead of hardcoded chunk(2)#13478
Kosinkadink merged 1 commit intoworksplit-multigpufrom
worksplit-multigpu-wip

Conversation

@Kosinkadink
Copy link
Copy Markdown
Member

@Kosinkadink Kosinkadink commented Apr 20, 2026

Problem

Hunyuan 3D 2.1 throws ValueError: not enough values to unpack (expected 2, got 1) when using multi-GPU worksplit (MultiGPU CFG Split node).

The model's forward() method hardcodes context.chunk(2, dim=0) assuming both cond and uncond are always batched together. In multi-GPU worksplit mode, a device may only receive one of them, so chunk(2) on a batch of 1 returns only 1 chunk.

Fix

Use transformer_options["cond_or_uncond"] to detect whether both cond and uncond are present in the batch. Only perform the half-swap when both are present (len == 2 and set == {0, 1}). When only one is present, skip the reordering entirely (identity operation).

Changes

  • comfy/ldm/hunyuan3dv2_1/hunyuandit.py: Replace hardcoded chunk(2) with conditional swap gated on cond_or_uncond

Performance (dual RTX 4090)

Sampling-only speed (isolating the accelerated portion)

Resolution Single GPU Multi GPU Sampling Speedup
4096 3.58 it/s 6.38 it/s 1.78x
8192 1.62 it/s 3.10 it/s 1.91x

End-to-end wall time (includes non-accelerated VAE decode + mesh generation)

Resolution Single GPU Multi GPU Overall Speedup
4096 22.79s 19.22s 1.19x
8192 35.76s 28.02s 1.28x

Sampling-only speedup is near-theoretical 2x for CFG split across 2 GPUs. Overall speedup is lower because VAE decode and mesh generation are not accelerated by worksplit.

Benchmarks: 2 warmup runs + 5 timed runs, batch_size=1, 30 steps, euler sampler.

@Kosinkadink Kosinkadink force-pushed the worksplit-multigpu-wip branch from 3e7c01f to 4b8fa98 Compare April 20, 2026 09:12
@Kosinkadink Kosinkadink merged commit 37deccb into worksplit-multigpu Apr 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant