Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
feat: diffusion router
  • Loading branch information
alphabetc1 committed Feb 17, 2026
commit 2d81aedcf05caa4905ad1ac1e9f3e6fb9fbac833
2 changes: 2 additions & 0 deletions .codespellrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[codespell]
ignore-words-list = te
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -182,11 +182,11 @@ cython_debug/
.abstra/

# Visual Studio Code
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/
.vscode/

# Ruff stuff:
.ruff_cache/
Expand Down
38 changes: 38 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-toml
- id: check-yaml
- id: check-ast
- id: check-added-large-files
- id: check-merge-conflict
- id: debug-statements
- id: detect-private-key
- id: no-commit-to-branch

- repo: https://github.com/PyCQA/isort
rev: 5.13.2
hooks:
- id: isort

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.7
hooks:
- id: ruff
args:
- --select=F401,F821
- --fix

- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black

- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
args: ['--config', '.codespellrc']
213 changes: 205 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,210 @@
# sglang-diffusion-routing

A demonstrative example of running SGLang Diffusion with a DP router, which supports `generation` (a lot of methods, including [SDE/CPS](https://github.com/sgl-project/sglang/pull/18806)), `update_weights_from_disk` in PR [18306](https://github.com/sgl-project/sglang/pull/18306), and `health_check`.
A lightweight router for SGLang diffusion workers.

1. Copy all the codes of https://github.com/radixark/miles/pull/544 to here with sincere acknowledgment.
2. Write up a detailed README on how to use SGLang Diffusion Router to launch multiple instances and send requests.
It provides worker registration, load balancing, health checking, and request proxying for diffusion generation APIs.

For example, given that we can make a Python binding of the sglang-d router:
## Highlights

1. pip install sglang-d-router (Only for local development right now, clone the repository and run `pip install .` from the root directory. No need to make a PyPi)
2. pip install "sglang[diffusion]"
3. launching command (how to use sglang-d-router to launch n sglang diffusion servers)
4. Sending demonstrative requests
- `least-request` routing by default, with `round-robin` and `random`.
- Background health checks with quarantine after repeated failures.
- Router APIs for worker registration, health inspection, and proxy forwarding.
- `update_weights_from_disk` broadcast to all healthy workers.

## Installation

From repository root:

```bash
python3 -m venv .venv
. .venv/bin/activate
pip install .
```

Development install:

```bash
pip install -e .
```

Run tests:

```bash
pip install pytest
pytest tests/unit -v
```

Workers require SGLang diffusion support:

```bash
pip install "sglang[diffusion]"
```

## Quick Start

### 1) Start diffusion workers

```bash
# worker 1
CUDA_VISIBLE_DEVICES=0 sglang serve \
--model-path stabilityai/stable-diffusion-3-medium-diffusers \
--num-gpus 1 \
--host 127.0.0.1 \
--port 30000

# worker 2
CUDA_VISIBLE_DEVICES=1 sglang serve \
--model-path stabilityai/stable-diffusion-3-medium-diffusers \
--num-gpus 1 \
--host 127.0.0.1 \
--port 30001
```

### 2) Start the router

Script entry:

```bash
sglang-d-router --port 30080 \
--worker-urls http://localhost:30000 http://localhost:30001
```

Module entry:

```bash
python -m sglang_diffusion_routing --port 30080 \
--worker-urls http://localhost:30000 http://localhost:30001
```

Or start empty and add workers later:

```bash
sglang-d-router --port 30080
curl -X POST "http://localhost:30080/add_worker?url=http://localhost:30000"
```

### 3) Test the router

```bash
# Check router health
curl http://localhost:30080/health

# List registered workers
curl http://localhost:30080/list_workers

# Image generation request (SD3)
curl -X POST http://localhost:30080/generate \
-H "Content-Type: application/json" \
-d '{
"model": "stabilityai/stable-diffusion-3-medium-diffusers",
"prompt": "a cute cat",
"num_images": 1
}'

# Video generation request
curl -X POST http://localhost:30080/generate_video \
-H "Content-Type: application/json" \
-d '{
"model": "stabilityai/stable-video-diffusion",
"prompt": "a flowing river"
}'

# Check per-worker health and load
curl http://localhost:30080/health_workers
```

## Router API

- `POST /add_worker`: add worker via query (`?url=`) or JSON body.
- `GET /list_workers`: list registered workers.
- `GET /health`: aggregated router health.
- `GET /health_workers`: per-worker health and active request counts.
- `POST /generate`: forwards to worker `/v1/images/generations`.
- `POST /generate_video`: forwards to worker `/v1/videos`.
- `POST /update_weights_from_disk`: broadcast to healthy workers.
- `GET|POST|PUT|DELETE /{path}`: catch-all proxy forwarding.

## `update_weights_from_disk` behavior

Full details: [docs/update_weights_from_disk.md](docs/update_weights_from_disk.md)

- The router forwards request payloads as-is to each healthy worker.
- The router does not validate payload schema; payload semantics are worker-defined.
- Worker servers must implement `POST /update_weights_from_disk`.

Example:

```bash
curl -X POST http://localhost:30080/update_weights_from_disk \
-H "Content-Type: application/json" \
-d '{"model_path": "/path/to/new/weights"}'
```

Response shape:

```json
{
"results": [
{
"worker_url": "http://localhost:30000",
"status_code": 200,
"body": {
"ok": true
}
}
]
}
```

## Benchmark Scripts

Benchmark scripts are available under `tests/benchmarks/diffusion_router/` and are intended for manual runs.
They are not part of default unit test collection (`pytest tests/unit -v`).

Single benchmark:

```bash
python tests/benchmarks/diffusion_router/bench_router.py \
--model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
--num-workers 2 \
--num-prompts 20 \
--max-concurrency 4
```

Algorithm comparison:

```bash
python tests/benchmarks/diffusion_router/bench_routing_algorithms.py \
--model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
--num-workers 2 \
--num-prompts 20 \
--max-concurrency 4
```

## Project Layout

```text
.
├── docs/
│ └── update_weights_from_disk.md
├── src/sglang_diffusion_routing/
│ ├── cli/
│ └── router/
├── tests/
│ ├── benchmarks/
│ │ └── diffusion_router/
│ │ ├── bench_router.py
│ │ └── bench_routing_algorithms.py
│ └── unit/
├── pyproject.toml
└── README.md
```

## Acknowledgment

This project is derived from [radixark/miles#544](https://github.com/radixark/miles/pull/544). Thanks to the original authors for their work.

## Notes

- Quarantined workers are intentionally not auto-reintroduced.
- Router responses are fully buffered; streaming passthrough is not implemented.
65 changes: 65 additions & 0 deletions docs/update_weights_from_disk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# update_weights_from_disk

This document describes `POST /update_weights_from_disk` behavior in this repository.

## Router behavior

The router does not validate or transform payload fields.
It forwards the original request body to every healthy worker and returns per-worker results.

Payload semantics are therefore defined by the worker implementation, not by the router.

## Requirements

- Worker servers must implement `POST /update_weights_from_disk`.
- For SGLang workers, use a version that includes this endpoint.
- Weights must match your worker runtime expectations.

## Basic example

```bash
curl -X POST http://localhost:30080/update_weights_from_disk \
-H "Content-Type: application/json" \
-d '{"model_path": "/path/to/new/weights"}'
```

## Optional fields

Some worker versions support optional fields such as `target_modules`:

```bash
curl -X POST http://localhost:30080/update_weights_from_disk \
-H "Content-Type: application/json" \
-d '{"model_path": "/path/to/weights", "target_modules": ["transformer", "vae"]}'
```

If your worker version does not support extra fields, failure is returned by the worker side.

## Response shape

The router response includes one item per healthy worker:

```json
{
"results": [
{
"worker_url": "http://localhost:10090",
"status_code": 200,
"body": {
"ok": true
}
},
{
"worker_url": "http://localhost:10092",
"status_code": 500,
"body": {
"error": "worker-side failure"
}
}
]
}
```

Notes:
- Quarantined workers are excluded from broadcast.
- Transport/runtime exceptions are surfaced as per-worker `status_code=502`.
36 changes: 36 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
[build-system]
requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "sglang-diffusion-routing"
version = "0.1.0"
description = "Load-balancing router for SGLang diffusion workers"
readme = "README.md"
requires-python = ">=3.10"
license = { text = "MIT" }
dependencies = [
"fastapi>=0.110",
"httpx>=0.27",
"uvicorn>=0.30",
]
classifiers = [
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Intended Audience :: Developers",
]

[project.scripts]
sglang-d-router = "sglang_diffusion_routing.cli.main:main"

[tool.setuptools]
package-dir = { "" = "src" }

[tool.setuptools.packages.find]
where = ["src"]

[tool.pytest.ini_options]
testpaths = ["tests/unit"]
5 changes: 5 additions & 0 deletions src/sglang_diffusion_routing/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Public package API for sglang diffusion routing."""

from sglang_diffusion_routing.router.diffusion_router import DiffusionRouter

__all__ = ["DiffusionRouter"]
4 changes: 4 additions & 0 deletions src/sglang_diffusion_routing/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from sglang_diffusion_routing.cli.main import main

if __name__ == "__main__":
raise SystemExit(main())
1 change: 1 addition & 0 deletions src/sglang_diffusion_routing/cli/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""CLI package for sglang diffusion routing."""
Loading