-
-
Notifications
You must be signed in to change notification settings - Fork 45
Enabling Efficient Model and Container Image Distribution in LLMaz with Dragonfly #361
Copy link
Copy link
Open
Labels
featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.help wantedExtra attention is neededExtra attention is neededneeds-kindIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.
Metadata
Metadata
Assignees
Labels
featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.help wantedExtra attention is neededExtra attention is neededneeds-kindIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.
This is added to OSSP program, so you need to go to https://summer-ospp.ac.cn/org/prodetail/257c80106?list=org&navpage=org to know the details.Remove the OSSP tag as no one applied for this task.(1) Background: llmaz is a lightweight inference platform based on Kubernetes, focused on efficient deployment and inference of large language models (https://github.com/InftyAI/llmaz). Dragonfly is an open-source P2P file distribution and image acceleration system suitable for cloud-native environments, enhancing model and image distribution efficiency. llmaz has integrated Manta as a lightweight model caching system, but support for image and model distribution needs further optimization.
(2) Existing Work: llmaz supports multiple model providers (e.g., HuggingFace) and inference backends (e.g., vLLM), with Manta providing model caching and distribution. Manta leverages P2P technology for model sharding and preheating but focuses solely on models, not container images, and its functionality is still under refactoring.
What would you like to be added:
(4) Desired Improvements: Integrate Dragonfly to optimize llmaz’s image and model distribution efficiency, supporting unified P2P caching and acceleration. Referencing Manta’s lightweight design, ensure Dragonfly integration maintains low resource usage while improving speed and stability.
(5) Ultimate Goal: Implement efficient image and model distribution for llmaz using Dragonfly, enhance P2P caching and acceleration, and build a lightweight, versatile solution referencing Manta to improve deployment efficiency and reduce resource costs.
Why is this needed:
llmaz currently lacks efficient container image distribution support. Model distribution relies on Manta, which is incomplete and does not handle images. Dragonfly’s P2P distribution capabilities are not yet integrated, resulting in slow image and model loading, impacting deployment efficiency.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
@carlory will be the mentor of this task.