Skip to content

[spec v2]Fix torch gc of future indices#18958

Merged
hnyls2002 merged 3 commits intomainfrom
lsyin/fix-spec-v2-data-race
Feb 19, 2026
Merged

[spec v2]Fix torch gc of future indices#18958
hnyls2002 merged 3 commits intomainfrom
lsyin/fix-spec-v2-data-race

Conversation

@hnyls2002
Copy link
Collaborator

@hnyls2002 hnyls2002 commented Feb 18, 2026

How to reproduce this on small models (llama3-8b on H200)

diff --git a/python/sglang/srt/managers/scheduler.py b/python/sglang/srt/managers/scheduler.py
index 3435fcaef..526bf04df 100644
--- a/python/sglang/srt/managers/scheduler.py
+++ b/python/sglang/srt/managers/scheduler.py
@@ -2332,6 +2332,7 @@ class Scheduler(
 
                 with self.forward_stream_ctx:
                     self.forward_stream.wait_stream(self.default_stream)
+                    torch.cuda._sleep(1_000_000_000)
                     self.future_map.resolve_future(model_worker_batch)
                     with self.record_forward_metrics(batch):
                         batch_result = self.model_worker.forward_batch_generation(

fix #18744
close #18803

I seriously thought about @nvcastet's and @trevor-m's analysis of the data races, and I concluded that there are actually no data races in the traditional sense. Even though prepare_for_decode and _draft_extend_for_decode can access the shared buffer req_to_token at the same time, there are no conflicts between these two phases — even if there were, they wouldn't cause out-of-bound errors or IMA.
So I started thinking about whether some tensors could be garbage-collected because they were not recorded across streams. Some tensors are created on the scheduler (default) stream but used on the forward_stream, and their reference counts drop to zero during forwarding.
I originally thought the problematic tensor would be a direct field of ScheduleBatch, so I did a full clone of all GPU tensors in ModelWorkerBatch as an ablation. Even after that, the IMA still occurred. Trevor gave me a very useful hint: the indices inside FutureMap had bad values that didn't make sense. That pointed me to the root cause — future_indices.indices is allocated on the default stream, used on the forward_stream, and its Python references are dropped (when model_worker_batch.spec_info and batch.spec_info are replaced) before the GPU finishes reading it. The fix is a record_stream call on the indices tensor.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002
Copy link
Collaborator Author

/tag-and-rerun-ci

@trevor-m
Copy link
Collaborator

Confirmed that this fixed the issue on our wideep disagg repro

@hnyls2002 hnyls2002 merged commit 5ff5aa6 into main Feb 19, 2026
362 of 391 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/fix-spec-v2-data-race branch February 19, 2026 19:38
trevor-m pushed a commit to trevor-m/sglang that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] index out of bounds error with Spec V2

2 participants