Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ ______________________________________________________________________
<details open>
<summary><b>2025</b></summary>

- \[2025/09\] TurboMind supports MXFP4 on NVIDIA GPUs starting from V100, achieving 1.5x the performmance of vLLM on H800 for openai gpt-oss models!
- \[2025/06\] Comprehensive inference optimization for FP8 MoE Models
- \[2025/06\] DeepSeek PD Disaggregation deployment is now supported through integration with [DLSlime](https://github.com/DeepLink-org/DLSlime) and [Mooncake](https://github.com/kvcache-ai/Mooncake). Huge thanks to both teams!
- \[2025/04\] Enhance DeepSeek inference performance by integration deepseek-ai techniques: FlashMLA, DeepGemm, DeepEP, MicroBatch and eplb
Expand Down Expand Up @@ -149,6 +150,7 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
<li>Phi-3.5-MoE (16x3.8B)</li>
<li>Phi-4-mini (3.8B)</li>
<li>MiniCPM3 (4B)</li>
<li>gpt-oss (20B, 120B)</li>
</ul>
</td>
<td>
Expand Down Expand Up @@ -204,7 +206,7 @@ conda activate lmdeploy
pip install lmdeploy
```

The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
The default prebuilt package is compiled on **CUDA 12.8** since v0.10.0.
For more information on installing on CUDA 11+ platform, or for instructions on building from source, please refer to the [installation guide](docs/en/get_started/installation.md).

## Offline Batch Inference
Expand Down
4 changes: 0 additions & 4 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,6 @@ ______________________________________________________________________

## 最新ニュース 🎉

<details open>
<summary><b>2025</b></summary>
</details>

<details close>
<summary><b>2024</b></summary>

Expand Down
4 changes: 3 additions & 1 deletion README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ ______________________________________________________________________
<summary><b>2025</b></summary>
</details>

- 【2025年9月】TurboMind 引擎支持 MXFP4,适用于 NVIDIA V100 及以上 GPU。在 H800 上推理 openai gpt-oss 模型,性能可达 vLLM 的 1.5倍!
- 【2025年6月】深度优化 FP8 MoE 模型推理
- 【2025年6月】集成[DLSlime](https://github.com/DeepLink-org/DLSlime)和[Mooncake](https://github.com/kvcache-ai/Mooncake),实现DeepSeek PD分离部署,向两个团队表示诚挚的感谢!
- 【2025年4月】集成deepseek-ai组件FlashMLA、DeepGemm、DeepEP、MicroBatch、eplb等,提升DeepSeek推理性能
Expand Down Expand Up @@ -150,6 +151,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>Phi-3.5-MoE (16x3.8B)</li>
<li>Phi-4-mini (3.8B)</li>
<li>MiniCPM3 (4B)</li>
<li>gpt-oss (20B, 120B)</li>
</ul>
</td>
<td>
Expand Down Expand Up @@ -205,7 +207,7 @@ conda activate lmdeploy
pip install lmdeploy
```

自 v0.3.0 起,LMDeploy 预编译包默认基于 CUDA 12 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,或者源码安装 LMDeploy,请参考[安装文档](docs/zh_cn/get_started/installation.md)
自 v0.10.0 起,LMDeploy 预编译包默认基于 CUDA 12.8 编译。如果需要在 CUDA 11+ 下安装 LMDeploy,或者源码安装 LMDeploy,请参考[安装文档](docs/zh_cn/get_started/installation.md)

## 离线批处理

Expand Down
4 changes: 2 additions & 2 deletions docs/en/get_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pip install lmdeploy
The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.9.2
export LMDEPLOY_VERSION=0.10.0
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down Expand Up @@ -51,7 +51,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git
If you prefer a specific version instead of the `main` branch of LMDeploy, you can specify it in your command:

```shell
pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.9.2.zip
pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.0.zip
```

If you want to build LMDeploy with support for Ascend, Cambricon, or MACA, install LMDeploy with the corresponding `LMDEPLOY_TARGET_DEVICE` environment variable.
1 change: 1 addition & 0 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
| gpt-oss | 20B,120B | LLM | Yes | Yes | Yes | Yes |

"-" means not verified yet.

Expand Down
4 changes: 2 additions & 2 deletions docs/zh_cn/get_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pip install lmdeploy
默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy:

```shell
export LMDEPLOY_VERSION=0.9.2
export LMDEPLOY_VERSION=0.10.0
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down Expand Up @@ -51,7 +51,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git
如果您希望使用特定版本,而不是 LMDeploy 的 `main` 分支,可以在命令行中指定:

```shell
pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.9.2.zip
pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.0.zip
```

如果您希望构建支持昇腾、寒武纪或沐熙的 LMDeploy,请使用相应的 `LMDEPLOY_TARGET_DEVICE` 环境变量进行安装。
1 change: 1 addition & 0 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
| gpt-oss | 20B,120B | LLM | Yes | Yes | Yes | Yes |

“-” 表示还没有验证。

Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Tuple

__version__ = '0.9.2'
__version__ = '0.10.0'
short_version = __version__


Expand Down
Loading