Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions trinity/trainer/verl/fsdp_workers.py
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,12 @@ def update_actor(self, data: DataProto):
"After offload actor optimizer during update_actor", logger=self.logger
)

# Release reserved GPU memory held by PyTorch's caching allocator after
# backward passes. Without this, memory_reserved grows monotonically and
# eventually starves vLLM during weight sync in colocate mode.
# Matches the pattern in megatron_workers.py update_actor().
torch.cuda.empty_cache()

return output

@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name="actor"))
Expand Down
Loading