Skip to content

fix: compile inner model before DDP wrapping to prevent Dynamo tracing DDP internals#4017

Open
anishesg wants to merge 1 commit into
huggingface:mainfrom
anishesg:fix/ph-issue-3991
Open

fix: compile inner model before DDP wrapping to prevent Dynamo tracing DDP internals#4017
anishesg wants to merge 1 commit into
huggingface:mainfrom
anishesg:fix/ph-issue-3991

Conversation

@anishesg
Copy link
Copy Markdown

What does this PR do?

When using torch.compile with multi-GPU (DDP) training via Accelerate, users hit a crash during the forward pass:

torch._dynamo.exc.Unsupported: Unsupported method call
  Explanation: Dynamo does not know how to trace method `set_runtime_stats_and_log` of class `Logger`

The root cause is in accelerator.py's prepare_model: the code was wrapping the model with DistributedDataParallel first, then applying torch.compile to the DDP wrapper. This caused Dynamo to trace into DDP's internal _pre_forward hook which calls self.logger.set_runtime_stats_and_log() — a method on a user-defined object that Dynamo cannot trace.

The fix follows the PyTorch-recommended pattern for DDP + torch.compile: compile the inner model before wrapping it with DDP. DDP then operates outside the compiled region, so its internal logging and communication hooks are never seen by Dynamo. This is applied to both the MULTI_GPU and MULTI_CPU DDP paths in prepare_model. The final compile guard is also updated to skip models that already have compiled submodules (via has_compiled_regions), preventing the DDP wrapper from being double-compiled.

Fixes #3991

…g DDP internals

## What does this PR do?

Signed-off-by: anish k <ak8686@princeton.edu>
@anishesg anishesg mentioned this pull request Apr 25, 2026
4 tasks
@yuxinyuan
Copy link
Copy Markdown
Contributor

https://docs.pytorch.org/docs/2.12/notes/ddp.html The pytorch notes specifically mention DDP works with TorchDynamo. When used with TorchDynamo, apply the DDP model wrapper before compiling the model, such that torchdynamo can apply DDPOptimizer (graph-break optimizations) based on DDP bucket sizes.

Maybe it's a pytorch issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Isssue when using torch.compile

2 participants