[Feature] Support offload and wake up of SGLang Diffusion

### Checklist

- [x] If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Motivation

In the LLM RL scenario, sleeping and waking up an SGLang server is widely used and optimized in co-located placement. As detailed in Biao @hebiao064 blog: https://hebiao064.github.io/rl-memory-management

In LLM RL, we use `torch_memory_savor` to protect the virtual address of the SGLang LLM server in order to keep CUDA Graph alive. Right now in SGLang Diffusion, CUDA Graph is not supported (working on it by @zyksir ), in this sense. We may have more brute fore method to sleep and wake up. In extreme situations, we can even kill and relaunch the SGLang Diffusion server, and the relaunch time is profiled in https://github.com/sgl-project/sglang/issues/19087

In this sense, we may need a way to sleep and wake up SGLang Diffusion. The optimal API should be similar to https://docs.sglang.io/advanced_features/sglang_for_rl.html#fine-grained-engine-sleep-and-wake-up , but the start point can be more brute force.

If I let myself handle this issue myself, I will break this down into the following steps:

1. Try out the brute force way to sleep and wake up the SGLang Diffusion Server (like offload some crucial parts to CPU, I don't know), and compare that with directly killing and relaunching. If brute force is the best, then we are so cooked. 🤣
2. If sleep and waking up do help, then try to make up wake up and sleep APIs. Following what we did in LLM https://docs.sglang.io/advanced_features/sglang_for_rl.html#fine-grained-engine-sleep-and-wake-up . This API would be great.
3. Still if 2 is done, please provide an end2end time of "sleep, wake up + [refit](https://github.com/sgl-project/sglang/pull/18306)" vs "kill and relaunch". Hope this time, we can get further speed up.


### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support offload and wake up of SGLang Diffusion #19090

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support offload and wake up of SGLang Diffusion #19090

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions