[Fix] data race in req_to_token pool by cctry · Pull Request #17850 · sgl-project/sglang

cctry · 2026-01-28T03:30:23Z

Motivation

The chunked prefill requests will free its slot in req_to_token_pool and get allocated again when preparing for its next prefill batch.

As a result, if a prefill batch contains multiple requests and req_to_token_pool is at capacity. The write for matched kv indices for another request will overwrite the slot of the chunked requests which is being read in forward stream

Example

Prepare & Launch prefill batch N:     
    req A (first half) --> idx 1  
    
model runner reads idx 1
  
Prepare batch N+1: 
    req A (second half) --> idx 2
    req B --> idx 1

scheduler writes req B's matched indices to idx 1

Modifications

alloc(reqs: list[Req]) - Now takes request list, sets req.req_pool_idx directly, reuses slot if already set. cc @hnyls2002
Separate free() with free_mamba_cache(req, ...) in HybridReqToTokenPool - Only frees mamba state, not req slot cc @hanming-lu @yizhang2077
release_kv_cache() - Now calls free(req) at end; handles early mamba-only free case
Removed free() in process_prefill_chunk and cache_finished_req

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2026-01-28T03:30:27Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

cctry · 2026-01-28T03:31:22Z

/tag-run-ci-label

Henrry-CHEN · 2026-01-28T12:04:16Z

so if a prefill batch contain 2 or more request or request chunk, the accuracy of the mamba state for these req is not right?

cctry · 2026-01-28T20:25:16Z

Mamba state is correct but full attention can be wrong

ClawSeven · 2026-02-10T10:22:24Z

Hi @cctry,
I have a question regarding the described race conditions. If we disable overlap scheduling, could there still be other write race conditions?

ClawSeven · 2026-02-10T10:38:05Z

@cctry, Oh, I see. Since the kernel is launched asynchronously, even with overlap scheduling disabled, there could still be a race condition between write_cache_indices and the backend reading the KV caches.
Thank you for the fix—the PR not only resolves the chunked prefill issue but also addresses a potential problem in dLLM.

…vent cross-stream data race In overlap scheduling (MTPv2), `process_batch_result(N-1)` runs on the default stream concurrently with `forward(N)` on the forward stream. When a request finishes, `release_kv_cache` immediately returns its `req_pool_idx` to the free list. A new request can then recycle that pool index and `prepare_for_decode` overwrites the `req_to_token` row on the default stream while `forward(N)` still reads it — causing an "index out of bounds" assertion in IndexKernel.cu. Fix: defer the pool-index free by one overlap iteration. - `ReqToTokenPool.deferred_free(req)`: withholds the pool index from the free list (the slot cannot be reallocated). - `ReqToTokenPool.flush_deferred_frees()`: moves deferred slots back to the free list once the forward that read them has completed. - `release_kv_cache(..., defer_pool_free=True)`: used in the decode result-processing path when overlap is enabled. - `process_batch_result_decode`: flushes deferred frees right after `copy_done.synchronize()`, which guarantees the previous forward has finished reading `req_to_token`. This is the overlap-scheduling counterpart of PR sgl-project#17850, which fixed the same class of race for chunked prefill.

cctry requested review from ByronHsu, ShangmingCai, Ying1123, hanming-lu, hnyls2002, merrymercy, xiezhq-hermann and yizhang2077 as code owners January 28, 2026 03:30

github-actions bot added the run-ci label Jan 28, 2026

init

6ce3974

fix

30b9b41

cctry force-pushed the csy/fix_req_to_pool branch from 60ab814 to 30b9b41 Compare January 28, 2026 21:31

cctry added 3 commits January 28, 2026 15:04

fix dllm

8587abd

fix mamba eagle

f37e98b

fix test

0b9995d

ShangmingCai assigned ByronHsu and hnyls2002 Feb 2, 2026

merrymercy added the high priority label Feb 2, 2026

Merge branch 'main' into csy/fix_req_to_pool

8e4924c

merrymercy merged commit 027f314 into main Feb 2, 2026
194 of 214 checks passed

merrymercy deleted the csy/fix_req_to_pool branch February 2, 2026 22:38

charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Feb 5, 2026

[Fix] data race in req_to_token pool (sgl-project#17850)

c4fc22c

sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026

[Fix] data race in req_to_token pool (sgl-project#17850)

4e73590

nvcastet mentioned this pull request Feb 13, 2026

[Fix] Defer req_to_token pool-index free in overlap scheduling to prevent cross-stream data race #18803

Closed

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

[Fix] data race in req_to_token pool (sgl-project#17850)

df6cb63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] data race in req_to_token pool#17850

[Fix] data race in req_to_token pool#17850
merrymercy merged 6 commits intomainfrom
csy/fix_req_to_pool

cctry commented Jan 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Uh oh!

cctry commented Jan 28, 2026

Uh oh!

Henrry-CHEN commented Jan 28, 2026

Uh oh!

cctry commented Jan 28, 2026 via email •

edited

Loading

Uh oh!

Uh oh!

ClawSeven commented Feb 10, 2026

Uh oh!

ClawSeven commented Feb 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

cctry commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Example

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Uh oh!

cctry commented Jan 28, 2026

Uh oh!

Henrry-CHEN commented Jan 28, 2026

Uh oh!

cctry commented Jan 28, 2026 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ClawSeven commented Feb 10, 2026

Uh oh!

ClawSeven commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cctry commented Jan 28, 2026 •

edited

Loading

cctry commented Jan 28, 2026 via email •

edited

Loading

ClawSeven commented Feb 10, 2026 •

edited

Loading