Skip to content

[Feature] Implement "Miles Diffusion Router" for Workload-Aware Rollouts #541

@zhaochenyang20

Description

@zhaochenyang20

Motivation

To support large-scale RL rollouts and high-throughput generation, we need to implement a dedicated router for the diffusion engine, tentatively named Diffusion Router.

This router will build upon the concepts of Cache-Aware Load Balancing and Data Parallel (DP) Routing used in the SGLang LLM engine. The goal is to implement a "workload-minimal" routing strategy that ensures requests are distributed to the most available or appropriate engine instances while maintaining system health.

Goals

  1. Core Infrastructure: Create a standalone demo/implementation of a router tailored for sglang-diffusion instances.
  2. Interface Support: Implement the following three critical API interfaces:
  • health_check: Monitor the status of downstream diffusion workers.
  • generate: Route generation requests based on current workload/availability.
  • update_weights_from_disk: Interface placeholder (can be a stub for now) to support future dynamic weight updates.
  1. Minimalist Routing: Focus on low-latency, workload-aware distribution to minimize generation bottlenecks during rollouts.

Technical Tasks

  • Study the existing SGLang Router implementation for LLMs (see resources below).
  • Develop the miles-diffusion-router script/module.
  • Implement basic load balancing logic (e.g., Least-Request or Round-Robin as a baseline, moving toward workload-minimal).
  • Create a demo script showing the router coordinating multiple sglang-diffusion backends.
  • Document the setup process and API usage.

Resources

Calling community members interested in distributed systems and RL infrastructure! Help us build the backbone of the Miles rollout system.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions