Skip to content
Prev Previous commit
Next Next commit
addressed comments: docs
  • Loading branch information
sniper35 committed Feb 8, 2026
commit d39899d62193a02d7f6b67a36a9e4dabbbac8a2a
4 changes: 2 additions & 2 deletions examples/diffusion_router/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ curl -X POST 'http://localhost:30080/add_worker?url=http://localhost:10090'
| `POST` | `/add_worker` | Register a diffusion worker (`?url=...` or JSON body) |
| `GET` | `/list_workers` | List registered workers |
| `POST` | `/update_weights_from_disk` | Broadcast weight reload to all workers |
| `*` | `/{path}` | Catch-all proxy to least-loaded worker |
| `GET, POST, PUT, DELETE` | `/{path}` | Catch-all proxy to least-loaded worker |

## Load Balancing

Expand Down Expand Up @@ -62,7 +62,7 @@ curl -X POST http://localhost:30080/update_weights_from_disk \
--port Port (default: 30080)
--worker-urls Initial worker URLs
--max-connections Max concurrent connections (default: 100)
--timeout Request timeout in seconds
--timeout Request timeout in seconds for router-to-worker requests
--health-check-interval Seconds between health checks (default: 10)
--health-check-failure-threshold Failures before quarantine (default: 3)
--verbose Enable verbose logging
Expand Down
2 changes: 2 additions & 0 deletions miles/router/diffusion_router.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ async def _health_check_loop(self):
f"[diffusion-router] Worker {url} failed {threshold} consecutive checks. Marking DEAD."
)
self.dead_workers.add(url)
# Dead workers are permanently excluded. Reconnecting them
# would risk serving stale weights after training has moved on.
else:
self.worker_failure_counts[url] = 0

Expand Down