SGLang Diffusion Router

A lightweight router for SGLang diffusion workers used in RL systems. It provides worker registration, load balancing, health checking, refit weights and request proxying for diffusion generation APIs.

Installation

git clone --recursive https://github.com/zhaochenyang20/sglang-diffusion-routing.git
cd sglang-diffusion-routing

# If cloned without --recursive, initialize the sglang submodule:
# git submodule update --init --recursive

# Install the router package
uv pip install .

# Install SGLang diffusion from the pinned submodule (includes RL patches).
# Do NOT install sglang from PyPI — the submodule tracks a fork with
# /v1/diffusion/generate, flow-matching log-prob, and other RL features.
cd sglang/python
uv pip install ".[diffusion]" --prerelease=allow
cd ../..

Quick Start

Co-Launch Workers and Router

Instead of starting workers manually, you can let the router spawn and manage them via a YAML config file.

sglang-d-router --port 30081 --launcher-config examples/local_launcher.yaml

launcher:
  backend: local
  model: Qwen/Qwen-Image
  num_workers: 8
  num_gpus_per_worker: 1
  worker_base_port: 10090
  wait_timeout: 600

Manual Launch Workers

# If connect to HuggingFace is not allowed
# You can set the environment variable SGLANG_USE_MODELSCOPE=TRUE

# worker 1
CUDA_VISIBLE_DEVICES=0 sglang serve \
    --model-path Qwen/Qwen-Image \
    --num-gpus 1 \
    --host 127.0.0.1 \
    --port 30000

# worker 2
CUDA_VISIBLE_DEVICES=1 sglang serve \
    --model-path Qwen/Qwen-Image \
    --num-gpus 1 \
    --host 127.0.0.1 \
    --port 30002

sglang-d-router --port 30081 \
    --worker-urls http://localhost:30000 http://localhost:30002

Demonstrative Examples

A typical RL loop looks like:

1. Start workers   → sglang-d-router --port 30081 --launcher-config examples/local_launcher.yaml
2. Rollout         → POST /v1/images/generations or /v1/diffusion/generate
3. Sleep workers   → POST /release_memory_occupation
4. Train on rollout data (GPU memory now free for training)
5. Wake workers    → POST /resume_memory_occupation
6. Refit weights   → POST /update_weights_from_disk
7. Repeat from step 2

We provide all the steps in the code examples below.

With Python Requests

import requests
import base64

ROUTER = "http://localhost:30081"

# Check router health
resp = requests.get(f"{ROUTER}/health")
print(resp.json())

# Register a worker
resp = requests.post(f"{ROUTER}/workers", json={"url": "http://localhost:30000"})
print(resp.json())

# List registered workers (with health/load)
resp = requests.get(f"{ROUTER}/workers")
print(resp.json())
worker_id = resp.json()["workers"][0]["worker_id"]

# Get / update worker details
resp = requests.get(f"{ROUTER}/workers/{worker_id}")
print(resp.json())
resp = requests.put(
    f"{ROUTER}/workers/{worker_id}",
    json={"is_dead": False, "refresh_video_support": True},
)
print(resp.json())

# Image generation request (OpenAI-compatible, returns base64-encoded image)
resp = requests.post(f"{ROUTER}/v1/images/generations", json={
    "model": "Qwen/Qwen-Image",
    "prompt": "a cute cat",
    "num_images": 1,
    "response_format": "b64_json",
})
data = resp.json()
print(data)

# Decode and save the image locally
img = base64.b64decode(data["data"][0]["b64_json"])
with open("output.png", "wb") as f:
    f.write(img)
print("Saved to output.png")

# Video generation request
# Note that Stable-Diffusion-3 does not support video generation,
# so this request will fail. Use a video-capable model instead.

resp = requests.post(f"{ROUTER}/v1/videos", json={
    "model": "Qwen/Qwen-Image",
    "prompt": "a flowing river",
})
print(resp.json())
video_id = resp.json().get("video_id") or resp.json().get("id")
if video_id:
    print(requests.get(f"{ROUTER}/v1/videos/{video_id}").json())

# Update weights from disk
resp = requests.post(f"{ROUTER}/update_weights_from_disk", json={
    "model_path": "Qwen/Qwen-Image-2512",
})
print(resp.json())

# sleep and wake up
resp = requests.post(f"{ROUTER}/release_memory_occupation", json={})
print(resp.json())


resp = requests.post(f"{ROUTER}/resume_memory_occupation", json={})
print(resp.json())

Native Diffusion Generate Endpoint (with Trajectory & Log-Prob)

The /v1/diffusion/generate endpoint exposes trajectory data (latents, timesteps) and log-probabilities that the OpenAI-compatible endpoints intentionally omit. This is intended for RL training pipelines that need intermediate diffusion outputs.

import requests
import base64
import io
import numpy as np

ROUTER = "http://localhost:30081"

# Generate with trajectory latents and log-probs
resp = requests.post(f"{ROUTER}/v1/diffusion/generate", json={
    "prompt": "a cute cat",
    "width": 512,
    "height": 512,
    "num_inference_steps": 28,
    "guidance_scale": 7.0,
    "seed": 42,
    "get_latents": True,
    "get_log_probs": True,
})
data = resp.json()
print(f"Inference time: {data.get('inference_time_s')}s")

# Decode the output image
img_bytes = base64.b64decode(data["output_b64"])
with open("output.png", "wb") as f:
    f.write(img_bytes)
print(f"Saved image ({data.get('output_format', 'unknown')} format)")

# Decode trajectory data
trajectory = data.get("trajectory")
latents = np.load(io.BytesIO(base64.b64decode(trajectory["latents"])))
print(f"Latents shape: {trajectory['latents_shape']}, dtype: {trajectory['latents_dtype']}")
print(f"Decoded latents array shape: {latents.shape}")

timesteps = [np.load(io.BytesIO(base64.b64decode(t))) for t in trajectory["timesteps"]]
print(f"Timesteps count: {len(timesteps)}")

log_probs = np.load(io.BytesIO(base64.b64decode(trajectory["log_probs"])))
print(f"Log-probs shape: {trajectory['log_probs_shape']}")
print(f"Decoded log-probs array shape: {log_probs.shape}")

Rollout with SDE/CPS Log-Prob Computation

For RL training, you can enable rollout mode with flow-matching SDE or CPS (Conditional Probability Score) log-probability computation. This is supported on both the OpenAI-compatible endpoints and the native diffusion endpoint.

import requests

ROUTER = "http://localhost:30081"

# Rollout with SDE log-prob (default)
resp = requests.post(f"{ROUTER}/v1/images/generations", json={
    "model": "stabilityai/stable-diffusion-3-medium-diffusers",
    "prompt": "a cute cat",
    "width": 512,
    "height": 512,
    "num_inference_steps": 28,
    "rollout": True,
    "rollout_sde_type": "sde",       # "sde" or "cps"
    "rollout_noise_level": 0.7,
    "response_format": "b64_json",
})
print(resp.json())

# Rollout with CPS log-prob
resp = requests.post(f"{ROUTER}/v1/images/generations", json={
    "model": "stabilityai/stable-diffusion-3-medium-diffusers",
    "prompt": "a cute cat",
    "width": 512,
    "height": 512,
    "num_inference_steps": 28,
    "rollout": True,
    "rollout_sde_type": "cps",
    "rollout_noise_level": 0.5,
    "response_format": "b64_json",
})
print(resp.json())

# Rollout via native diffusion endpoint (with trajectory + log-probs)
resp = requests.post(f"{ROUTER}/v1/diffusion/generate", json={
    "prompt": "a cute cat",
    "width": 512,
    "height": 512,
    "num_inference_steps": 28,
    "guidance_scale": 7.0,
    "seed": 42,
    "get_latents": True,
    "get_log_probs": True,
})
data = resp.json()
print(f"Trajectory available: {data.get('trajectory') is not None}")

Router API

Inference Endpoints

Method	Path	Description
`POST`	`/v1/images/generations`	OpenAI-compatible text-to-image generation
`POST`	`/v1/diffusion/generate`	Native SGLang-D generation with trajectory & log-prob support
`POST`	`/v1/videos`	Entrypoint for text-to-video generation

Videos Result Query

Method	Path	Description
`GET`	`/v1/videos`	List or poll video jobs
`GET`	`/v1/videos/{video_id}`	Get status/details of a single video job
`GET`	`/v1/videos/{video_id}/content`	Download generated video content

Video query routing is stable by video_id: router caches video_id -> worker on create (POST /v1/videos), then forwards detail/content queries to the same worker. Unknown video_id returns 404.

Model Discovery and Health Checks

Method	Path	Description
`GET`	`/v1/models`	OpenAI-style model discovery
`GET`	`/health`	Basic health probe

GET /v1/models aggregates model lists from healthy workers and de-duplicates by model id.

Worker Management APIs

Method	Path	Description
`POST`	`/workers`	Register a worker
`GET`	`/workers`	List workers (including health/load)
`GET`	`/workers/{worker_id}`	Get worker details
`PUT`	`/workers/{worker_id}`	Update worker configuration
`DELETE`	`/workers/{worker_id}`	Deregister a worker

worker_id is the URL-encoded worker URL (machine-oriented), and each worker payload also includes display_id as a human-readable ID.

PUT /workers/{worker_id} currently supports:

is_dead (boolean): quarantine (true) or recover (false) this worker.
refresh_video_support (boolean): re-probe worker /v1/models capability.

RL Related API

Method	Path	Description
`POST`	`/update_weights_from_disk`	Reload weights from disk on all healthy workers
`POST`	`/release_memory_occupation`	Sleep all healthy workers (release GPU memory)
`POST`	`/resume_memory_occupation`	Wake all sleeping workers (resume GPU memory)

Both sleep and wake are idempotent. While sleeping, generation requests are rejected (503 from router). A typical RL loop: wake → refit weights → rollout → sleep → train → repeat.

Acknowledgment

This project is derived from radixark/miles#544 and alphabetc1/sglang. Thanks to the original authors.

SGLang Diffusion RL team is responsible for the development and maintenance of this project. Our team mates in alphabetical order:

Banghua Zhu, Chengliang Qian, Chenyang Zhao, Fenglin Yu, Hao Jin, Huapeng Zhou, Jiajun Li, Kangrui Du, Kun Lin, Mao Cheng, Mengyang Liu, Qiujiang Chen, Shenggui Li, Shirui Chen, Shuwen Wang, Xi Chen, Xiaole Guo, Ying Sheng, Yueming Yuan, Yuhao Yang, Yusheng Su, Zhiheng Ye

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github		.github
examples		examples
sglang @ 76041e1		sglang @ 76041e1
src/sglang_diffusion_routing		src/sglang_diffusion_routing
tests		tests
.codespellrc		.codespellrc
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
development.md		development.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGLang Diffusion Router

Table of Contents

Installation

Quick Start

Co-Launch Workers and Router

Manual Launch Workers

Demonstrative Examples

With Python Requests

Native Diffusion Generate Endpoint (with Trajectory & Log-Prob)

Rollout with SDE/CPS Log-Prob Computation

Router API

Inference Endpoints

Videos Result Query

Model Discovery and Health Checks

Worker Management APIs

RL Related API

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SGLang Diffusion Router

Table of Contents

Installation

Quick Start

Co-Launch Workers and Router

Manual Launch Workers

Demonstrative Examples

With Python Requests

Native Diffusion Generate Endpoint (with Trajectory & Log-Prob)

Rollout with SDE/CPS Log-Prob Computation

Router API

Inference Endpoints

Videos Result Query

Model Discovery and Health Checks

Worker Management APIs

RL Related API

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages