diff --git a/docs/flagrelease_en/model_list.txt b/docs/flagrelease_en/model_list.txt index a57e7d5..9c4288d 100644 --- a/docs/flagrelease_en/model_list.txt +++ b/docs/flagrelease_en/model_list.txt @@ -1,3 +1,4 @@ +FlagRelease/C2S-Scale-Gemma-2-27B-FlagOS FlagRelease/DeepSeek-R1-Distill-Qwen-32B-FlagOS-Cambricon FlagRelease/DeepSeek-R1-Distill-Qwen-32B-FlagOS-NVIDIA FlagRelease/DeepSeek-R1-FlagOS-Cambricon-BF16 @@ -30,6 +31,7 @@ FlagRelease/GLM-5-ascend-FlagOS FlagRelease/Hunyuan-A13B-Instruct-FlagOS FlagRelease/Kimi-K2-Instruct-FlagOS FlagRelease/Kimi-K2-Thinking-FlagOS +FlagRelease/Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS FlagRelease/MiniCPM-V-4-FlagOS FlagRelease/MiniCPM-V-4-metax-FlagOS FlagRelease/MiniCPM-o-4.5-ascend-FlagOS @@ -105,8 +107,6 @@ FlagRelease/RoboBrain2.5-8B-FlagOS FlagRelease/RoboBrain2.5-8B-ascend-FlagOS FlagRelease/Seed-OSS-36B-Instruct-FlagOS FlagRelease/TeleChat3-36B-Thinking-mthreads-FlagOS -FlagRelease/gemma-3-1b-it-FlagOS -FlagRelease/gemma-3-1b-it-plugin-FlagOS FlagRelease/gpt-oss-120b-FlagOS FlagRelease/grok-2-FlagOS FlagRelease/phi-4-FlagOS diff --git a/docs/flagrelease_en/model_readmes/FlagRelease_C2S-Scale-Gemma-2-27B-FlagOS.md b/docs/flagrelease_en/model_readmes/FlagRelease_C2S-Scale-Gemma-2-27B-FlagOS.md new file mode 100644 index 0000000..8b35603 --- /dev/null +++ b/docs/flagrelease_en/model_readmes/FlagRelease_C2S-Scale-Gemma-2-27B-FlagOS.md @@ -0,0 +1,48 @@ +--- +license: Apache License 2.0 +tags: [] + +#model-type: +##如 gpt、phi、llama、chatglm、baichuan 等 +#- gpt + +#domain: +##如 nlp、cv、audio、multi-modal +#- nlp + +#language: +##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa +#- cn + +#metrics: +##如 CIDEr、Blue、ROUGE 等 +#- CIDEr + +#tags: +##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他 +#- pretrained + +#tools: +##如 vllm、fastchat、llamacpp、AdaSeq 等 +#- vllm +--- +### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。 +#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型 + +SDK下载 +```bash +#安装ModelScope +pip install modelscope +``` +```python +#SDK模型下载 +from modelscope import snapshot_download +model_dir = snapshot_download('FlagRelease/C2S-Scale-Gemma-2-27B-FlagOS') +``` +Git下载 +``` +#Git模型下载 +git clone https://www.modelscope.cn/FlagRelease/C2S-Scale-Gemma-2-27B-FlagOS.git +``` + +
如果您是本模型的贡献者,我们邀请您根据模型贡献文档,及时完善模型卡片内容。
\ No newline at end of file diff --git a/docs/flagrelease_en/model_readmes/FlagRelease_Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS.md b/docs/flagrelease_en/model_readmes/FlagRelease_Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS.md new file mode 100644 index 0000000..bb4f9c1 --- /dev/null +++ b/docs/flagrelease_en/model_readmes/FlagRelease_Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS.md @@ -0,0 +1,127 @@ +# Introduction +Kimi-Linear-48B-A3B-Instruct is a high-efficiency large language model developed by MoonshotAI. Built with an innovative hybrid linear attention architecture and equipped with 48B total parameters, it is specially optimized for long-context comprehension, multi-turn dialogue and complex reasoning scenarios, supporting an ultra-long context window up to 1 million tokens. + +Adopting a 3:1 structural ratio of Kimi Delta Attention and global MLA, this model greatly cuts down KV cache occupancy and improves inference throughput while maintaining strong comprehensive capability. It achieves outstanding results on multiple authoritative benchmarks, natively compatible with Transformers and vLLM frameworks, and can be quickly deployed for long document parsing, knowledge question answering and industrial intelligent conversation services. + + +### Integrated Deployment +- Out-of-the-box inference scripts with pre-configured hardware and software parameters +- Released **FlagOS-Nvidia** container image supporting deployment within minutes +### Consistency Validation +- Rigorously evaluated through benchmark testing: Performance and results from the FlagOS software stack are compared against native stacks on multiple public. + + +# Evaluation Results +## Benchmark Result +| Metrics | Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS-Nvidia-Origin | Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS-Nvidia-FlagOS | +|---------------------|----------------------------------------------------------|--------------------------------------| +| aime | 0.4667 | 0.4667 | +| musr_generative | 0.5926 | 0.5635 | +| mmlu_pro | 0.515 | 0.5315 | +| gpqa_generative_cot | 0.4295 | 0.4295 | +| livebench_new | 0.5438 | 0.5178 | + +# User Guide +Environment Setup + +| Item | Version | +|------------------|----------------------| +| Docker Version | Docker version 24.0.0, build 98fdcd7 | +| Operating System | 22.04.4 LTS (Jammy Jellyfish) | + +## Operation Steps + +### Download FlagOS Image +```bash +docker pull harbor.baai.ac.cn/external-cooperation/kimi-linear-48b-a3b-instruct-nvidia-tree_0.5.0_3.5-gems_5.0.2-vllm_0.13.0-plugin_0.1-cx_none-python_3.12.3-torch_2.9.0_cu128-pcp_cuda12.8-gpu_nvidia003-arc_amd64-driver_570.158.01:2605110300 +``` + +### Download Open-source Model Weights +```bash +pip install modelscope +modelscope download --model FlagRelease/Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS --local_dir /data/Kimi-Linear-48B-A3B-Instruct-nvidia-FlagOS +``` + +### Start the Container +```bash +docker run -itd --name=xxx --gpus=all --network=host -v /data:/data harbor.baai.ac.cn/external-cooperation/kimi-linear-48b-a3b-instruct-nvidia-tree_0.5.0_3.5-gems_5.0.2-vllm_0.13.0-plugin_0.1-cx_none-python_3.12.3-torch_2.9.0_cu128-pcp_cuda12.8-gpu_nvidia003-arc_amd64-driver_570.158.01:2605110300 sleep infinity + +docker exec -it xxx bash +``` +### Start the Server +```bash +export VLLM_PLUGINS=fl +export TRITON_ALL_BLOCKS_PARALLEL=1 +nohup vllm serve \ +--model /data/Kimi-Linear-48B-A3B-Instruct/ \ +--served-model-name kimi-linear \ +--host 0.0.0.0 \ +--port 6677 \ +--trust-remote-code \ +--tensor-parallel-size 2 \ +--enforce-eager \ +> kimi-flagos.log 2>&1 & + +tail -f imi-flagos.log +``` + +## Service Invocation +### Invocation Script +```bash +curl http://localhost:8000/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "flagOS", + "messages": [{"role": "user", "content": "你好"}] + }' +``` + + +### AnythingLLM Integration Guide + +#### 1. Download & Install + +- Visit the official site: https://anythingllm.com/ +- Choose the appropriate version for your OS (Windows/macOS/Linux) +- Follow the installation wizard to complete the setup + +#### 2. Configuration + +- Launch AnythingLLM +- Open settings (bottom left, fourth tab) +- Configure core LLM parameters +- Click "Save Settings" to apply changes + +#### 3. Model Interaction + +- After model loading is complete: +- Click **"New Conversation"** +- Enter your question (e.g., “Explain the basics of quantum computing”) +- Click the send button to get a response +# Technical Overview +**FlagOS** is a fully open-source system software stack designed to unify the "model–system–chip" layers and foster an open, collaborative ecosystem. It enables a “develop once, run anywhere” workflow across diverse AI accelerators, unlocking hardware performance, eliminating fragmentation among vendor-specific software stacks, and substantially lowering the cost of porting and maintaining AI workloads. With core technologies such as the **FlagScale**, together with vllm-plugin-fl, distributed training/inference framework, **FlagGems** universal operator library, **FlagCX** communication library, and **FlagTree** unified compiler, the **FlagRelease** platform leverages the **FlagOS** stack to automatically produce and release various combinations of \