-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Quick Check ✨
- I've taken a look at existing feature requests
- This feature request relates to GAIA UI (Open-WebUI)
What's on your mind?
I would like the ability to deploy GAIA as a Dockerized service while keeping the Lemonade inference backend running on separate hardware within my local network.
This mirrors how I previously configured OpenWebUI/Ollama before switching to AMD tooling, where the frontend and inference service were decoupled.
Benefits:
1. Routing and Load Balancing: Enables directing requests to specific backend nodes or distributing workloads across multiple inference hosts. For example, if multiple Strix Halo systems are available, each could be dedicated to certain models or used in a load-balanced, multi-user local environment with moderate to high request volume.
2. Maintenance Resilience: Allows performing maintenance, updates, or hardware changes on backend servers without losing access to GAIA’s local conversation history or disrupting the frontend.
3. Hardware Optimization: Permits placement of GAIA’s UI service on lightweight, low-power hardware (or in a VM) while dedicating high-performance compute systems purely to inference.
4. Scalability: Makes it easier to expand capacity by adding more inference backends without redeploying or modifying the frontend service.
Proposed Functionality:
• GAIA should be configurable to point to a remote Lemonade backend over LAN, ideally via environment variables or a config file.
• Support for specifying multiple backend endpoints for routing or load balancing would be ideal.
• The Docker container should expose only the GAIA frontend, without bundling the inference backend, to allow clean separation of concerns.