diff --git a/README.md b/README.md index d504176dd0..6034655f86 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,7 @@ ______________________________________________________________________
2025 +- \[2025/09\] TurboMind supports MXFP4 on NVIDIA GPUs starting from V100, achieving 1.5x the performmance of vLLM on H800 for openai gpt-oss models! - \[2025/06\] Comprehensive inference optimization for FP8 MoE Models - \[2025/06\] DeepSeek PD Disaggregation deployment is now supported through integration with [DLSlime](https://github.com/DeepLink-org/DLSlime) and [Mooncake](https://github.com/kvcache-ai/Mooncake). Huge thanks to both teams! - \[2025/04\] Enhance DeepSeek inference performance by integration deepseek-ai techniques: FlashMLA, DeepGemm, DeepEP, MicroBatch and eplb @@ -149,6 +150,7 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
  • Phi-3.5-MoE (16x3.8B)
  • Phi-4-mini (3.8B)
  • MiniCPM3 (4B)
  • +
  • gpt-oss (20B, 120B)
  • @@ -204,7 +206,7 @@ conda activate lmdeploy pip install lmdeploy ``` -The default prebuilt package is compiled on **CUDA 12** since v0.3.0. +The default prebuilt package is compiled on **CUDA 12.8** since v0.10.0. For more information on installing on CUDA 11+ platform, or for instructions on building from source, please refer to the [installation guide](docs/en/get_started/installation.md). ## Offline Batch Inference diff --git a/README_ja.md b/README_ja.md index 009d6749ad..3537e84935 100644 --- a/README_ja.md +++ b/README_ja.md @@ -23,10 +23,6 @@ ______________________________________________________________________ ## 最新ニュース 🎉 -
    -2025 -
    -
    2024 diff --git a/README_zh-CN.md b/README_zh-CN.md index 788995f1e4..e677ef2f7c 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -27,6 +27,7 @@ ______________________________________________________________________ 2025
    +- 【2025年9月】TurboMind 引擎支持 MXFP4,适用于 NVIDIA V100 及以上 GPU。在 H800 上推理 openai gpt-oss 模型,性能可达 vLLM 的 1.5倍! - 【2025年6月】深度优化 FP8 MoE 模型推理 - 【2025年6月】集成[DLSlime](https://github.com/DeepLink-org/DLSlime)和[Mooncake](https://github.com/kvcache-ai/Mooncake),实现DeepSeek PD分离部署,向两个团队表示诚挚的感谢! - 【2025年4月】集成deepseek-ai组件FlashMLA、DeepGemm、DeepEP、MicroBatch、eplb等,提升DeepSeek推理性能 @@ -150,6 +151,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
  • Phi-3.5-MoE (16x3.8B)
  • Phi-4-mini (3.8B)
  • MiniCPM3 (4B)
  • +
  • gpt-oss (20B, 120B)
  • @@ -205,7 +207,7 @@ conda activate lmdeploy pip install lmdeploy ``` -自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,或者源码安装 LMDeploy,请参考[安装文档](docs/zh_cn/get_started/installation.md) +自 v0.10.0 起,LMDeploy 预编译包默认基于 CUDA 12.8 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,或者源码安装 LMDeploy,请参考[安装文档](docs/zh_cn/get_started/installation.md) ## 离线批处理 diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md index efe32dc054..923da3448b 100644 --- a/docs/en/get_started/installation.md +++ b/docs/en/get_started/installation.md @@ -23,7 +23,7 @@ pip install lmdeploy The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by: ```shell -export LMDEPLOY_VERSION=0.9.2 +export LMDEPLOY_VERSION=0.10.0 export PYTHON_VERSION=310 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 ``` @@ -51,7 +51,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git If you prefer a specific version instead of the `main` branch of LMDeploy, you can specify it in your command: ```shell -pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.9.2.zip +pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.0.zip ``` If you want to build LMDeploy with support for Ascend, Cambricon, or MACA, install LMDeploy with the corresponding `LMDEPLOY_TARGET_DEVICE` environment variable. diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index 38f02fb1fa..cb437fc541 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -47,6 +47,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine | GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | | Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No | +| gpt-oss | 20B,120B | LLM | Yes | Yes | Yes | Yes | "-" means not verified yet. diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md index 755c80245d..749b0919a7 100644 --- a/docs/zh_cn/get_started/installation.md +++ b/docs/zh_cn/get_started/installation.md @@ -23,7 +23,7 @@ pip install lmdeploy 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy: ```shell -export LMDEPLOY_VERSION=0.9.2 +export LMDEPLOY_VERSION=0.10.0 export PYTHON_VERSION=310 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 ``` @@ -51,7 +51,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git 如果您希望使用特定版本,而不是 LMDeploy 的 `main` 分支,可以在命令行中指定: ```shell -pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.9.2.zip +pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.0.zip ``` 如果您希望构建支持昇腾、寒武纪或沐熙的 LMDeploy,请使用相应的 `LMDEPLOY_TARGET_DEVICE` 环境变量进行安装。 diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index aaebb1df1a..7c50be35fa 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -47,6 +47,7 @@ | GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | | Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No | +| gpt-oss | 20B,120B | LLM | Yes | Yes | Yes | Yes | “-” 表示还没有验证。 diff --git a/lmdeploy/version.py b/lmdeploy/version.py index ea8b3053b2..9f36352280 100644 --- a/lmdeploy/version.py +++ b/lmdeploy/version.py @@ -1,7 +1,7 @@ # Copyright (c) OpenMMLab. All rights reserved. from typing import Tuple -__version__ = '0.9.2' +__version__ = '0.10.0' short_version = __version__