Skip to content

Commit f06ab17

Browse files
qianyue76gemini-code-assist[bot]JiaxinD
authored
[diffusion] docs: consolidate diffusion documentation into docs (sgl-project#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: JiaxinD <djx2048@gmail.com>
1 parent 7eaf866 commit f06ab17

File tree

30 files changed

+512
-1481
lines changed

30 files changed

+512
-1481
lines changed

docs/advanced_features/server_arguments.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
373373
| `--kt-max-deferred-experts-per-token` | [ktransformers parameter] Maximum number of experts deferred to CPU per token. All MoE layers except the final one use this value; the final layer always uses 0. | `None` | Type: int |
374374
375375
## Diffusion LLM
376+
376377
| Argument | Description | Defaults | Options |
377378
| --- | --- | --- | --- |
378379
| `--dllm-algorithm` | The diffusion LLM algorithm, such as LowConfidence. | `None` | Type: str |

docs/basic_usage/diffusion.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ SGLang supports two categories of diffusion models for different use cases. This
44

55
## Image & Video Generation Models
66

7-
For generating images and videos from text prompts, SGLang supports [many](../supported_models/image_generation/diffusion_models.md#image-generation-models) models like:
7+
For generating images and videos from text prompts, SGLang supports [many](../diffusion/compatibility_matrix.md) models like:
88

99
- **FLUX, Qwen-Image** - High-quality image generation
1010
- **Wan 2.2, HunyuanVideo** - Video generation
@@ -16,4 +16,4 @@ python3 -m sglang.launch_server \
1616
--host 0.0.0.0 --port 30000
1717
```
1818

19-
**Full model list:** [Diffusion Models](../supported_models/image_generation/diffusion_models.md)
19+
**Full model list:** [Diffusion Models](../diffusion/compatibility_matrix.md)
Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ The SGLang-diffusion CLI provides a quick way to access the inference pipeline f
55
## Prerequisites
66

77
- A working SGLang diffusion installation and the `sglang` CLI available in `$PATH`.
8-
- Python 3.11+ if you plan to use the OpenAI Python SDK.
98

109

1110
## Supported Arguments
@@ -35,15 +34,15 @@ The SGLang-diffusion CLI provides a quick way to access the inference pipeline f
3534
- `--seed {SEED}`: Random seed for reproducible generation
3635

3736

38-
#### Image/Video Configuration
37+
**Image/Video Configuration**
3938

4039
- `--height {HEIGHT}`: Height of the generated output
4140
- `--width {WIDTH}`: Width of the generated output
4241
- `--num-frames {NUM_FRAMES}`: Number of frames to generate
4342
- `--fps {FPS}`: Frames per second for the saved output, if this is a video-generation task
4443

4544

46-
#### Output Options
45+
**Output Options**
4746

4847
- `--output-path {PATH}`: Directory to save the generated video
4948
- `--save-output`: Whether to save the image/video to disk
@@ -168,7 +167,7 @@ When enabled, the server follows a **Generate -> Upload -> Delete** workflow:
168167
3. Upon successful upload, the local file is deleted.
169168
4. The API response returns the public URL of the uploaded object.
170169

171-
#### Configuration
170+
**Configuration**
172171

173172
Cloud storage is enabled via environment variables. Note that `boto3` must be installed separately (`pip install boto3`) to use this feature.
174173

@@ -183,7 +182,7 @@ export SGLANG_S3_SECRET_ACCESS_KEY=your-secret-key
183182
export SGLANG_S3_ENDPOINT_URL=https://minio.example.com
184183
```
185184

186-
See [Environment Variables Documentation](environment_variables.md) for more details.
185+
See [Environment Variables Documentation](../environment_variables.md) for more details.
187186

188187
## Generate
189188

python/sglang/multimodal_gen/docs/openai_api.md renamed to docs/diffusion/api/openai_api.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
The SGLang diffusion HTTP server implements an OpenAI-compatible API for image and video generation, as well as LoRA adapter management.
44

5+
## Prerequisites
6+
7+
- Python 3.11+ if you plan to use the OpenAI Python SDK.
8+
59
## Serve
610

711
Launch the server using the `sglang serve` command.
@@ -25,7 +29,7 @@ sglang serve "${SERVER_ARGS[@]}"
2529
- **--model-path**: Path to the model or model ID.
2630
- **--port**: HTTP port to listen on (default: `30000`).
2731

28-
#### Get Model Information
32+
**Get Model Information**
2933

3034
**Endpoint:** `GET /models`
3135

@@ -59,7 +63,7 @@ curl -sS -X GET "http://localhost:30010/models"
5963

6064
The server implements an OpenAI-compatible Images API under the `/v1/images` namespace.
6165

62-
#### Create an image
66+
**Create an image**
6367

6468
**Endpoint:** `POST /v1/images/generations`
6569

@@ -100,7 +104,7 @@ curl -sS -X POST "http://localhost:30010/v1/images/generations" \
100104
> **Note**
101105
> The `response_format=url` option is not supported for `POST /v1/images/generations` and will return a `400` error.
102106
103-
#### Edit an image
107+
**Edit an image**
104108

105109
**Endpoint:** `POST /v1/images/edits`
106110

@@ -130,7 +134,7 @@ curl -sS -X POST "http://localhost:30010/v1/images/edits" \
130134
-F "response_format=url"
131135
```
132136

133-
#### Download image content
137+
**Download image content**
134138

135139
When `response_format=url` is used with `POST /v1/images/edits`, the API returns a relative URL like `/v1/images/<IMAGE_ID>/content`.
136140

@@ -148,7 +152,7 @@ curl -sS -L "http://localhost:30010/v1/images/<IMAGE_ID>/content" \
148152

149153
The server implements a subset of the OpenAI Videos API under the `/v1/videos` namespace.
150154

151-
#### Create a video
155+
**Create a video**
152156

153157
**Endpoint:** `POST /v1/videos`
154158

@@ -178,7 +182,7 @@ curl -sS -X POST "http://localhost:30010/v1/videos" \
178182
}'
179183
```
180184

181-
#### List videos
185+
**List videos**
182186

183187
**Endpoint:** `GET /v1/videos`
184188

@@ -197,7 +201,7 @@ curl -sS -X GET "http://localhost:30010/v1/videos" \
197201
-H "Authorization: Bearer sk-proj-1234567890"
198202
```
199203

200-
#### Download video content
204+
**Download video content**
201205

202206
**Endpoint:** `GET /v1/videos/{video_id}/content`
203207

@@ -239,7 +243,7 @@ The server supports dynamic loading, merging, and unmerging of LoRA adapters.
239243
- Switching: To switch LoRAs, you must first `unmerge` the current one, then `set` the new one
240244
- Caching: The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has little cost
241245

242-
#### Set LoRA Adapter
246+
**Set LoRA Adapter**
243247

244248
Loads one or more LoRA adapters and merges their weights into the model. Supports both single LoRA (backward compatible) and multiple LoRA adapters.
245249

@@ -301,7 +305,7 @@ curl -X POST http://localhost:30010/v1/set_lora \
301305
> - Multiple LoRAs applied to the same target will be merged in order
302306
303307

304-
#### Merge LoRA Weights
308+
**Merge LoRA Weights**
305309

306310
Manually merges the currently set LoRA weights into the base model.
307311

@@ -323,7 +327,7 @@ curl -X POST http://localhost:30010/v1/merge_lora_weights \
323327
```
324328

325329

326-
#### Unmerge LoRA Weights
330+
**Unmerge LoRA Weights**
327331

328332
Unmerges the currently active LoRA weights from the base model, restoring it to its original state. This **must** be called before setting a different LoRA.
329333

@@ -336,7 +340,7 @@ curl -X POST http://localhost:30010/v1/unmerge_lora_weights \
336340
-H "Content-Type: application/json"
337341
```
338342

339-
#### List LoRA Adapters
343+
**List LoRA Adapters**
340344

341345
Returns loaded LoRA adapters and current application status per module.
342346

python/sglang/multimodal_gen/docs/ci_perf.md renamed to docs/diffusion/ci_perf.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
2-
## Perf baseline generation script
1+
## Perf Baseline Generation Script
32

43
`python/sglang/multimodal_gen/test/scripts/gen_perf_baselines.py` starts a local diffusion server, issues requests for selected test cases, aggregates stage/denoise-step/E2E timings from the perf log, and writes the results back to the `scenarios` section of `perf_baselines.json`.
54

python/sglang/multimodal_gen/docs/support_matrix.md renamed to docs/diffusion/compatibility_matrix.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ default parameters when initializing and generating videos.
1616

1717
### Video Generation Models
1818

19-
| Model Name | Hugging Face Model ID | Resolutions | TeaCache | Sliding Tile Attn | Sage Attn | Video Sparse Attention (VSA) | Sparse Linear AttentionSLA| Sage Sparse Linear AttentionSageSLA|
19+
| Model Name | Hugging Face Model ID | Resolutions | TeaCache | Sliding Tile Attn | Sage Attn | Video Sparse Attention (VSA) | Sparse Linear Attention (SLA) | Sage Sparse Linear Attention (SageSLA) |
2020
|:-----------------------------|:--------------------------------------------------|:--------------------|:--------:|:-----------------:|:---------:|:----------------------------:|:----------------------------:|:-----------------------------------------------:|
2121
| FastWan2.1 T2V 1.3B | `FastVideo/FastWan2.1-T2V-1.3B-Diffusers` | 480p |||||||
2222
| FastWan2.2 TI2V 5B Full Attn | `FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers` | 720p |||||||
@@ -34,8 +34,8 @@ default parameters when initializing and generating videos.
3434
| TurboWan2.1 T2V 14B 720P | `IPostYellow/TurboWan2.1-T2V-14B-720P-Diffusers` | 720p |||||||
3535
| TurboWan2.2 I2V A14B | `IPostYellow/TurboWan2.2-I2V-A14B-Diffusers` | 720p |||||||
3636

37-
**Note**: <br>
38-
1.Wan2.2 TI2V 5B has some quality issues when performing I2V generation. We are working on fixing this issue.<br>
37+
**Note**:
38+
1.Wan2.2 TI2V 5B has some quality issues when performing I2V generation. We are working on fixing this issue.
3939
2.SageSLA Based on SpargeAttn. Install it first with `pip install git+https://github.com/thu-ml/SpargeAttn.git --no-build-isolation`
4040

4141
### Image Generation Models
@@ -55,7 +55,7 @@ default parameters when initializing and generating videos.
5555

5656
This section lists example LoRAs that have been explicitly tested and verified with each base model in the **SGLang Diffusion** pipeline.
5757

58-
> Important: \
58+
> Important:
5959
> LoRAs that are not listed here are not necessarily incompatible.
6060
> In practice, most standard LoRAs are expected to work, especially those following common Diffusers or SD-style conventions.
6161
> The entries below simply reflect configurations that have been manually validated by the SGLang team.

python/sglang/multimodal_gen/docs/contributing.md renamed to docs/diffusion/contributing.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This guide outlines the requirements for contributing to the SGLang Diffusion module (`sglang.multimodal_gen`).
44

5-
## 1. Commit Message Convention
5+
## Commit Message Convention
66

77
We follow a structured commit message format to maintain a clean history.
88

@@ -21,7 +21,7 @@ We follow a structured commit message format to maintain a clean history.
2121
- **Scope** (Optional): `cli`, `scheduler`, `model`, `pipeline`, `docs`, etc.
2222
- **Subject**: Imperative mood, short and clear (e.g., "add feature" not "added feature").
2323

24-
## 2. Performance Reporting
24+
## Performance Reporting
2525

2626
For PRs that impact **latency**, **throughput**, or **memory usage**, you **should** provide a performance comparison report.
2727

@@ -45,7 +45,7 @@ For PRs that impact **latency**, **throughput**, or **memory usage**, you **shou
4545
```
4646
4. **Paste**: paste the table into the PR description
4747

48-
## 3. CI-Based Change Protection
48+
## CI-Based Change Protection
4949

5050
Consider adding tests to the `pr-test` or `nightly-test` suites to safeguard your changes, especially for PRs that:
5151

python/sglang/multimodal_gen/docs/environment_variables.md renamed to docs/diffusion/environment_variables.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
## Caching Acceleration
22

33
These variables configure caching acceleration for Diffusion Transformer (DiT) models.
4-
SGLang supports multiple caching strategies - see [caching documentation](cache/caching.md) for an overview.
4+
SGLang supports multiple caching strategies - see [caching documentation](performance/cache/index.md) for an overview.
55

66
### Cache-DiT Configuration
77

8-
See [cache-dit documentation](cache/cache_dit.md) for detailed configuration.
8+
See [cache-dit documentation](performance/cache/cache_dit.md) for detailed configuration.
99

1010
| Environment Variable | Default | Description |
1111
|-------------------------------------|---------|------------------------------------------|

docs/diffusion/index.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# SGLang Diffusion
2+
3+
SGLang Diffusion is an inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.
4+
5+
## Key Features
6+
7+
- **Broad Model Support**: Wan series, FastWan series, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux, Z-Image, GLM-Image, and more
8+
- **Fast Inference**: Optimized kernels, efficient scheduler loop, and Cache-DiT acceleration
9+
- **Ease of Use**: OpenAI-compatible API, CLI, and Python SDK
10+
- **Multi-Platform**: NVIDIA GPUs (H100, H200, A100, B200, 4090) and AMD GPUs (MI300X, MI325X)
11+
12+
---
13+
14+
## Quick Start
15+
16+
### Installation
17+
18+
```bash
19+
uv pip install "sglang[diffusion]" --prerelease=allow
20+
```
21+
22+
See [Installation Guide](installation.md) for more installation methods and ROCm-specific instructions.
23+
24+
### Basic Usage
25+
26+
Generate an image with the CLI:
27+
28+
```bash
29+
sglang generate --model-path Qwen/Qwen-Image \
30+
--prompt "A beautiful sunset over the mountains" \
31+
--save-output
32+
```
33+
34+
Or start a server with the OpenAI-compatible API:
35+
36+
```bash
37+
sglang serve --model-path Qwen/Qwen-Image --port 30010
38+
```
39+
40+
---
41+
42+
## Documentation
43+
44+
### Getting Started
45+
46+
- **[Installation](installation.md)** - Install SGLang Diffusion via pip, uv, Docker, or from source
47+
- **[Compatibility Matrix](compatibility_matrix.md)** - Supported models and optimization compatibility
48+
49+
### Usage
50+
51+
- **[CLI Documentation](api/cli.md)** - Command-line interface for `sglang generate` and `sglang serve`
52+
- **[OpenAI API](api/openai_api.md)** - OpenAI-compatible API for image/video generation and LoRA management
53+
54+
### Performance Optimization
55+
56+
- **[Performance Overview](performance/index.md)** - Overview of all performance optimization strategies
57+
- **[Attention Backends](performance/attention_backends.md)** - Available attention backends (FlashAttention, SageAttention, etc.)
58+
- **[Caching Strategies](performance/cache/)** - Cache-DiT and TeaCache acceleration
59+
- **[Profiling](performance/profiling.md)** - Profiling techniques with PyTorch Profiler and Nsight Systems
60+
61+
### Reference
62+
63+
- **[Environment Variables](environment_variables.md)** - Configuration via environment variables
64+
- **[Support New Models](support_new_models.md)** - Guide for adding new diffusion models
65+
- **[Contributing](contributing.md)** - Contribution guidelines and commit message conventions
66+
- **[CI Performance](ci_perf.md)** - Performance baseline generation script
67+
68+
---
69+
70+
## CLI Quick Reference
71+
72+
### Generate (one-off generation)
73+
74+
```bash
75+
sglang generate --model-path <MODEL> --prompt "<PROMPT>" --save-output
76+
```
77+
78+
### Serve (HTTP server)
79+
80+
```bash
81+
sglang serve --model-path <MODEL> --port 30010
82+
```
83+
84+
### Enable Cache-DiT acceleration
85+
86+
```bash
87+
SGLANG_CACHE_DIT_ENABLED=true sglang generate --model-path <MODEL> --prompt "<PROMPT>"
88+
```
89+
90+
---
91+
92+
## References
93+
94+
- [SGLang GitHub](https://github.com/sgl-project/sglang)
95+
- [Cache-DiT](https://github.com/vipshop/cache-dit)
96+
- [FastVideo](https://github.com/hao-ai-lab/FastVideo)
97+
- [xDiT](https://github.com/xdit-project/xDiT)
98+
- [Diffusers](https://github.com/huggingface/diffusers)

0 commit comments

Comments
 (0)