fix: seperate different model flashinfer params #697

Vinkle-hzt · 2026-02-11T03:08:29Z

Draft Model and Target Model should hold different flashinfer params cache.
If not, since params has host(CPU) buffers and there is no sync after Draft Model forward, it will cause CUDA illegal memory access at PersistentVariableLengthMergeStatesKernel when target model rewrite params.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

fix: seperate different model flashinfer params

48357f0

Vinkle-hzt requested a review from LLLLKKKK as a code owner February 11, 2026 03:08

Copilot AI review requested due to automatic review settings February 11, 2026 03:08

Copilot AI reviewed Feb 11, 2026

View reviewed changes

Vinkle-hzt enabled auto-merge (rebase) February 11, 2026 03:28

Copilot started reviewing on behalf of Vinkle-hzt February 11, 2026 03:49 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: seperate different model flashinfer params #697

fix: seperate different model flashinfer params #697

Vinkle-hzt commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: seperate different model flashinfer params #697

Are you sure you want to change the base?

fix: seperate different model flashinfer params #697

Conversation

Vinkle-hzt commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant