This issue tracks orchestration of multiple containers. There are 2 services: one is serving the large language model (LLM) by itself while the other is the front end that provides a secure access point to it. The ideal design is just one container but the LLM requires significantly more computing resources and optimizations than standard.
This issue tracks orchestration of multiple containers. There are 2 services: one is serving the large language model (LLM) by itself while the other is the front end that provides a secure access point to it. The ideal design is just one container but the LLM requires significantly more computing resources and optimizations than standard.