-
Notifications
You must be signed in to change notification settings - Fork 4
feat: diffusion router #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
zhaochenyang20
merged 2 commits into
zhaochenyang20:main
from
alphabetc1:feat/sglang-d-router
Feb 21, 2026
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
feat: diffusion router
- Loading branch information
commit 2d81aedcf05caa4905ad1ac1e9f3e6fb9fbac833
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| [codespell] | ||
| ignore-words-list = te |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| repos: | ||
| - repo: https://github.com/pre-commit/pre-commit-hooks | ||
| rev: v5.0.0 | ||
| hooks: | ||
| - id: trailing-whitespace | ||
| - id: end-of-file-fixer | ||
| - id: check-toml | ||
| - id: check-yaml | ||
| - id: check-ast | ||
| - id: check-added-large-files | ||
| - id: check-merge-conflict | ||
| - id: debug-statements | ||
| - id: detect-private-key | ||
| - id: no-commit-to-branch | ||
|
|
||
| - repo: https://github.com/PyCQA/isort | ||
| rev: 5.13.2 | ||
| hooks: | ||
| - id: isort | ||
|
|
||
| - repo: https://github.com/astral-sh/ruff-pre-commit | ||
| rev: v0.11.7 | ||
zhaochenyang20 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| hooks: | ||
| - id: ruff | ||
| args: | ||
| - --select=F401,F821 | ||
| - --fix | ||
zhaochenyang20 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - repo: https://github.com/psf/black | ||
| rev: 24.10.0 | ||
zhaochenyang20 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| hooks: | ||
| - id: black | ||
|
|
||
| - repo: https://github.com/codespell-project/codespell | ||
| rev: v2.4.1 | ||
zhaochenyang20 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| hooks: | ||
| - id: codespell | ||
| args: ['--config', '.codespellrc'] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,210 @@ | ||
| # sglang-diffusion-routing | ||
|
|
||
| A demonstrative example of running SGLang Diffusion with a DP router, which supports `generation` (a lot of methods, including [SDE/CPS](https://github.com/sgl-project/sglang/pull/18806)), `update_weights_from_disk` in PR [18306](https://github.com/sgl-project/sglang/pull/18306), and `health_check`. | ||
| A lightweight router for SGLang diffusion workers. | ||
|
|
||
| 1. Copy all the codes of https://github.com/radixark/miles/pull/544 to here with sincere acknowledgment. | ||
| 2. Write up a detailed README on how to use SGLang Diffusion Router to launch multiple instances and send requests. | ||
| It provides worker registration, load balancing, health checking, and request proxying for diffusion generation APIs. | ||
|
|
||
| For example, given that we can make a Python binding of the sglang-d router: | ||
| ## Highlights | ||
|
|
||
| 1. pip install sglang-d-router (Only for local development right now, clone the repository and run `pip install .` from the root directory. No need to make a PyPi) | ||
| 2. pip install "sglang[diffusion]" | ||
| 3. launching command (how to use sglang-d-router to launch n sglang diffusion servers) | ||
| 4. Sending demonstrative requests | ||
| - `least-request` routing by default, with `round-robin` and `random`. | ||
| - Background health checks with quarantine after repeated failures. | ||
| - Router APIs for worker registration, health inspection, and proxy forwarding. | ||
| - `update_weights_from_disk` broadcast to all healthy workers. | ||
|
|
||
| ## Installation | ||
|
|
||
| From repository root: | ||
|
|
||
| ```bash | ||
| python3 -m venv .venv | ||
| . .venv/bin/activate | ||
| pip install . | ||
| ``` | ||
|
|
||
| Development install: | ||
|
|
||
| ```bash | ||
| pip install -e . | ||
| ``` | ||
|
|
||
| Run tests: | ||
|
|
||
| ```bash | ||
| pip install pytest | ||
| pytest tests/unit -v | ||
zhaochenyang20 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| Workers require SGLang diffusion support: | ||
|
|
||
| ```bash | ||
| pip install "sglang[diffusion]" | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### 1) Start diffusion workers | ||
|
|
||
| ```bash | ||
| # worker 1 | ||
| CUDA_VISIBLE_DEVICES=0 sglang serve \ | ||
| --model-path stabilityai/stable-diffusion-3-medium-diffusers \ | ||
| --num-gpus 1 \ | ||
| --host 127.0.0.1 \ | ||
| --port 30000 | ||
|
|
||
| # worker 2 | ||
| CUDA_VISIBLE_DEVICES=1 sglang serve \ | ||
| --model-path stabilityai/stable-diffusion-3-medium-diffusers \ | ||
| --num-gpus 1 \ | ||
| --host 127.0.0.1 \ | ||
| --port 30001 | ||
| ``` | ||
|
|
||
| ### 2) Start the router | ||
|
|
||
| Script entry: | ||
|
|
||
| ```bash | ||
| sglang-d-router --port 30080 \ | ||
| --worker-urls http://localhost:30000 http://localhost:30001 | ||
| ``` | ||
|
|
||
| Module entry: | ||
|
|
||
| ```bash | ||
| python -m sglang_diffusion_routing --port 30080 \ | ||
| --worker-urls http://localhost:30000 http://localhost:30001 | ||
| ``` | ||
|
|
||
| Or start empty and add workers later: | ||
|
|
||
| ```bash | ||
| sglang-d-router --port 30080 | ||
| curl -X POST "http://localhost:30080/add_worker?url=http://localhost:30000" | ||
| ``` | ||
|
|
||
| ### 3) Test the router | ||
|
|
||
| ```bash | ||
| # Check router health | ||
| curl http://localhost:30080/health | ||
|
|
||
| # List registered workers | ||
| curl http://localhost:30080/list_workers | ||
|
|
||
| # Image generation request (SD3) | ||
| curl -X POST http://localhost:30080/generate \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model": "stabilityai/stable-diffusion-3-medium-diffusers", | ||
| "prompt": "a cute cat", | ||
| "num_images": 1 | ||
| }' | ||
|
|
||
| # Video generation request | ||
| curl -X POST http://localhost:30080/generate_video \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model": "stabilityai/stable-video-diffusion", | ||
| "prompt": "a flowing river" | ||
| }' | ||
|
|
||
| # Check per-worker health and load | ||
| curl http://localhost:30080/health_workers | ||
| ``` | ||
|
|
||
| ## Router API | ||
|
|
||
| - `POST /add_worker`: add worker via query (`?url=`) or JSON body. | ||
| - `GET /list_workers`: list registered workers. | ||
| - `GET /health`: aggregated router health. | ||
| - `GET /health_workers`: per-worker health and active request counts. | ||
| - `POST /generate`: forwards to worker `/v1/images/generations`. | ||
| - `POST /generate_video`: forwards to worker `/v1/videos`. | ||
| - `POST /update_weights_from_disk`: broadcast to healthy workers. | ||
| - `GET|POST|PUT|DELETE /{path}`: catch-all proxy forwarding. | ||
|
|
||
| ## `update_weights_from_disk` behavior | ||
|
|
||
| Full details: [docs/update_weights_from_disk.md](docs/update_weights_from_disk.md) | ||
|
|
||
| - The router forwards request payloads as-is to each healthy worker. | ||
| - The router does not validate payload schema; payload semantics are worker-defined. | ||
| - Worker servers must implement `POST /update_weights_from_disk`. | ||
|
|
||
| Example: | ||
|
|
||
| ```bash | ||
| curl -X POST http://localhost:30080/update_weights_from_disk \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"model_path": "/path/to/new/weights"}' | ||
| ``` | ||
|
|
||
| Response shape: | ||
|
|
||
| ```json | ||
| { | ||
| "results": [ | ||
| { | ||
| "worker_url": "http://localhost:30000", | ||
| "status_code": 200, | ||
| "body": { | ||
| "ok": true | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| ## Benchmark Scripts | ||
|
|
||
| Benchmark scripts are available under `tests/benchmarks/diffusion_router/` and are intended for manual runs. | ||
| They are not part of default unit test collection (`pytest tests/unit -v`). | ||
|
|
||
| Single benchmark: | ||
|
|
||
| ```bash | ||
| python tests/benchmarks/diffusion_router/bench_router.py \ | ||
| --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \ | ||
| --num-workers 2 \ | ||
| --num-prompts 20 \ | ||
| --max-concurrency 4 | ||
| ``` | ||
|
|
||
| Algorithm comparison: | ||
|
|
||
| ```bash | ||
| python tests/benchmarks/diffusion_router/bench_routing_algorithms.py \ | ||
| --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \ | ||
| --num-workers 2 \ | ||
| --num-prompts 20 \ | ||
| --max-concurrency 4 | ||
| ``` | ||
|
|
||
| ## Project Layout | ||
|
|
||
| ```text | ||
| . | ||
| ├── docs/ | ||
| │ └── update_weights_from_disk.md | ||
| ├── src/sglang_diffusion_routing/ | ||
| │ ├── cli/ | ||
| │ └── router/ | ||
| ├── tests/ | ||
| │ ├── benchmarks/ | ||
| │ │ └── diffusion_router/ | ||
| │ │ ├── bench_router.py | ||
| │ │ └── bench_routing_algorithms.py | ||
| │ └── unit/ | ||
| ├── pyproject.toml | ||
| └── README.md | ||
| ``` | ||
|
|
||
| ## Acknowledgment | ||
|
|
||
| This project is derived from [radixark/miles#544](https://github.com/radixark/miles/pull/544). Thanks to the original authors for their work. | ||
|
|
||
| ## Notes | ||
|
|
||
| - Quarantined workers are intentionally not auto-reintroduced. | ||
| - Router responses are fully buffered; streaming passthrough is not implemented. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| # update_weights_from_disk | ||
|
|
||
| This document describes `POST /update_weights_from_disk` behavior in this repository. | ||
|
|
||
| ## Router behavior | ||
|
|
||
| The router does not validate or transform payload fields. | ||
| It forwards the original request body to every healthy worker and returns per-worker results. | ||
|
|
||
| Payload semantics are therefore defined by the worker implementation, not by the router. | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Worker servers must implement `POST /update_weights_from_disk`. | ||
| - For SGLang workers, use a version that includes this endpoint. | ||
| - Weights must match your worker runtime expectations. | ||
|
|
||
| ## Basic example | ||
|
|
||
| ```bash | ||
| curl -X POST http://localhost:30080/update_weights_from_disk \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"model_path": "/path/to/new/weights"}' | ||
| ``` | ||
|
|
||
| ## Optional fields | ||
|
|
||
| Some worker versions support optional fields such as `target_modules`: | ||
|
|
||
| ```bash | ||
| curl -X POST http://localhost:30080/update_weights_from_disk \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"model_path": "/path/to/weights", "target_modules": ["transformer", "vae"]}' | ||
| ``` | ||
|
|
||
| If your worker version does not support extra fields, failure is returned by the worker side. | ||
|
|
||
| ## Response shape | ||
|
|
||
| The router response includes one item per healthy worker: | ||
|
|
||
| ```json | ||
| { | ||
| "results": [ | ||
| { | ||
| "worker_url": "http://localhost:10090", | ||
| "status_code": 200, | ||
| "body": { | ||
| "ok": true | ||
| } | ||
| }, | ||
| { | ||
| "worker_url": "http://localhost:10092", | ||
| "status_code": 500, | ||
| "body": { | ||
| "error": "worker-side failure" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| Notes: | ||
| - Quarantined workers are excluded from broadcast. | ||
| - Transport/runtime exceptions are surfaced as per-worker `status_code=502`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| [build-system] | ||
| requires = ["setuptools>=68", "wheel"] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [project] | ||
| name = "sglang-diffusion-routing" | ||
| version = "0.1.0" | ||
| description = "Load-balancing router for SGLang diffusion workers" | ||
| readme = "README.md" | ||
| requires-python = ">=3.10" | ||
| license = { text = "MIT" } | ||
| dependencies = [ | ||
| "fastapi>=0.110", | ||
| "httpx>=0.27", | ||
| "uvicorn>=0.30", | ||
| ] | ||
| classifiers = [ | ||
| "License :: OSI Approved :: MIT License", | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3 :: Only", | ||
| "Programming Language :: Python :: 3.10", | ||
| "Programming Language :: Python :: 3.11", | ||
| "Intended Audience :: Developers", | ||
| ] | ||
|
|
||
| [project.scripts] | ||
| sglang-d-router = "sglang_diffusion_routing.cli.main:main" | ||
|
|
||
| [tool.setuptools] | ||
| package-dir = { "" = "src" } | ||
|
|
||
| [tool.setuptools.packages.find] | ||
| where = ["src"] | ||
|
|
||
| [tool.pytest.ini_options] | ||
| testpaths = ["tests/unit"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| """Public package API for sglang diffusion routing.""" | ||
|
|
||
| from sglang_diffusion_routing.router.diffusion_router import DiffusionRouter | ||
|
|
||
| __all__ = ["DiffusionRouter"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| from sglang_diffusion_routing.cli.main import main | ||
|
|
||
| if __name__ == "__main__": | ||
| raise SystemExit(main()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| """CLI package for sglang diffusion routing.""" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.