fix(trainer): add empty_cache() after compute_ref_log_prob to prevent OOM by Leon-Algo · Pull Request #548 · agentscope-ai/Trinity-RFT

Leon-Algo · 2026-05-25T00:44:18Z

Summary

Add torch.cuda.empty_cache() after the reference model forward pass in compute_ref_log_prob().

This complements #541 which added empty_cache() after update_policy() for the actor path. The ref model path has the same memory leak issue.

Root Cause

In colocate mode, vLLM and FSDP trainer share the same GPU. During compute_ref_log_prob(), the ref model creates large intermediate tensors (logits with vocab_size up to 248K). After output.to("cpu") moves the result to CPU, PyTorch's caching allocator still reserves the GPU memory used by these intermediates. This reserved memory grows monotonically across training steps and is never released back to CUDA, eventually causing OOM.

What This PR Does

Adds torch.cuda.empty_cache() right after output = output.to("cpu") in compute_ref_log_prob(), releasing the caching allocator's reserved memory back to CUDA so it can be reused by vLLM and subsequent training steps.

Verification

Tested on A100 80GB with Qwen3.5-0.8B in colocate mode:

Metric	Without patch	With patch
OOM at step 2	Yes (77 GiB reserved)	No
Stable memory	N/A	~40 GiB reserved
Training steps completed	1	5+

The fix is a single 3-line addition (comment + empty_cache() call) with no behavioral changes to training logic.

… OOM Add torch.cuda.empty_cache() after the reference model forward pass in compute_ref_log_prob(). Without this, PyTorch's caching allocator retains GPU memory reserved during the ref-log-prob computation, and memory_reserved grows monotonically across training steps, eventually causing OOM in colocate mode where vLLM and FSDP trainer share the same GPU. This complements PR agentscope-ai#541 which added empty_cache() after update_policy() for the actor path. The ref model path has the same issue: it creates large intermediate tensors (logits with vocab_size up to 248K) that remain reserved even after being moved to CPU. Verified on A100 80GB with Qwen3.5-0.8B colocate training: - Without patch: OOM at step 2 (77 GiB reserved) - With patch: stable at ~40 GiB reserved across 5+ steps

pan-x-c

LGTM

pan-x-c approved these changes May 25, 2026

View reviewed changes

pan-x-c merged commit 991eda5 into agentscope-ai:main May 25, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(trainer): add empty_cache() after compute_ref_log_prob to prevent OOM#548

fix(trainer): add empty_cache() after compute_ref_log_prob to prevent OOM#548
pan-x-c merged 1 commit into
agentscope-ai:mainfrom
Leon-Algo:fix/fsdp-empty-cache-after-ref-log-prob

Leon-Algo commented May 25, 2026

Uh oh!

pan-x-c left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Leon-Algo commented May 25, 2026

Summary

Root Cause

What This PR Does

Verification

Related

Uh oh!

pan-x-c left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants