build(docker): Try to optimize docker by windreamer · Pull Request #3779 · InternLM/lmdeploy

windreamer · 2025-07-26T15:41:56Z

Motivation

This PR aims to improve the efficiency and future compatibility of our Docker environment.

Reduce image size: The original Docker image was approximately 14GB, which caused slower pull times and increased storage usage.
Enable support for Blackwell GPUs: By updating to the CUDA 12.8.1 base image, this PR prepares the environment for the next-generation Blackwell architecture.

Modification

Multi-stage build:

Refactored the Dockerfile using a multi-stage build approach.
Reduced the final image size from ~14GB to ~8GB without losing functionality.

CUDA 12.8.1 upgrade:

Switched the base image to nvidia/cuda:12.8.1 to ensure compatibility with Blackwell GPUs.
Adjusted dependencies and build configurations to work seamlessly with the new CUDA version.

Result

Current docker image size

REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
lmdeploy     latest    952bdce6431e   20 seconds ago   7.61GB

Report from dive

Analyzing image...
  efficiency: 96.2069 %
  wastedBytes: 322126726 bytes (322 MB)
  userWastedPercent: 4.2792 %

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

docker/Dockerfile

docker/prepare_wheel.sh

windreamer · 2025-08-01T05:02:51Z

Temporarily disable DeepEP in Python 3.9 due to this issue: deepseek-ai/DeepEP#350

docker/build.sh

windreamer · 2025-08-05T23:01:42Z

Temporarily disable DeepEP in Python 3.9 due to this issue: deepseek-ai/DeepEP#350

fixed now

lvhan028 · 2025-08-08T06:42:41Z

The way I am used to compiling turbomind engine failed since cmake is missed in the docker image.

mkdir build
cd build
../generate.sh make

May pip install cmake in the image

windreamer · 2025-08-08T13:38:14Z

The way I am used to compiling turbomind engine failed since cmake is missed in the docker image.
mkdir build

cd build

../generate.sh make
May pip install cmake in the image

OK， I may leave it as a future work, may I?

CUHKSZzxy · 2025-08-11T04:03:22Z

DeepEP-related functionalities tested using their official testing scripts, looks good
Error occurs when warming up DeepGEMM, due to a mismatch in API.

DeepGEMM has modified the API after

Add more GPU architectures support deepseek-ai/DeepGEMM#112

We should merge this PR after the following fix

support deepgemm new api #3827

Trace:

ray.exceptions.RayTaskError(ModuleNotFoundError): ray::RayWorkerWrapper.build_graph_runner() (pid=18354, ip=10.130.8.137, actor_id=e403d329ae8eabda5441dc6801000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7fca38a4a770>)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base_worker.py", line 119, in build_graph_runner
    self.model_agent.build_graph_runner()
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 842, in build_graph_runner
    self.patched_model = backend.build_graph_runner(self.patched_model,
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/op_backend.py", line 195, in build_graph_runner
    get_warmup_manager().warmup(warmup_meta)
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/warmup_manager.py", line 45, in warmup
    func(warmup_meta)
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/blockedf8_modules.py", line 95, in warmup
    from deep_gemm.jit_kernels.utils import get_m_alignment_for_contiguous_layout
ModuleNotFoundError: No module named 'deep_gemm.jit_kernels'

Re-produce:

# start proxy server
lmdeploy serve proxy --server-name 172.16.4.52 --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO


# node 0
LMDEPLOY_DP_MASTER_ADDR=172.16.4.52 \
LMDEPLOY_DP_MASTER_PORT=29555 \
lmdeploy serve api_server \
    Qwen/Qwen3-235B-A22B-FP8 \
    --backend pytorch \
    --tp 1 \
    --dp 4 \
    --ep 4 \
    --proxy-url http://172.16.4.52:8000 \
    --nnodes 1 \
    --node-rank 0 \
    --log-level INFO

Misc
As a user, I would have some potential suggestions for improvement, but we can put them in another PR

Lack of vim?
If I want to modify sth quickly in Docker, it would be convenient to have sth like Vim.
Set a workspace in Docker?
In vLLM and SGlang Docker, when starting and entering their container, the default directory would be a workspace (e.g., SGLang).

When entering the current LMDeploy Docker container, the default dir is /. If we want to perform benchmarking

python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json

benchmark/profile_restful_api.py will not be available readily. It would be better if we had a default LMDeploy workspace containing the codes, then the above benchmarking code can be executed without further adjustment.

windreamer · 2025-08-11T12:58:40Z

DeepEP-related functionalities tested using their official testing scripts, looks good

Error occurs when warming up DeepGEMM, due to a mismatch in API.

DeepGEMM has modified the API after

Add more GPU architectures support deepseek-ai/DeepGEMM#112

We should merge this PR after the following fix

support deepgemm new api #3827

Trace:
ray.exceptions.RayTaskError(ModuleNotFoundError): ray::RayWorkerWrapper.build_graph_runner() (pid=18354, ip=10.130.8.137, actor_id=e403d329ae8eabda5441dc6801000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7fca38a4a770>)

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result

    return self.__get_result()

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result

    raise self._exception

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base_worker.py", line 119, in build_graph_runner

    self.model_agent.build_graph_runner()

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 842, in build_graph_runner

    self.patched_model = backend.build_graph_runner(self.patched_model,

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/op_backend.py", line 195, in build_graph_runner

    get_warmup_manager().warmup(warmup_meta)

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/warmup_manager.py", line 45, in warmup

    func(warmup_meta)

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/blockedf8_modules.py", line 95, in warmup

    from deep_gemm.jit_kernels.utils import get_m_alignment_for_contiguous_layout

ModuleNotFoundError: No module named 'deep_gemm.jit_kernels'
Re-produce:
# start proxy server

lmdeploy serve proxy --server-name 172.16.4.52 --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO





# node 0

LMDEPLOY_DP_MASTER_ADDR=172.16.4.52 \

LMDEPLOY_DP_MASTER_PORT=29555 \

lmdeploy serve api_server \

    Qwen/Qwen3-235B-A22B-FP8 \

    --backend pytorch \

    --tp 1 \

    --dp 4 \

    --ep 4 \

    --proxy-url http://172.16.4.52:8000 \

    --nnodes 1 \

    --node-rank 0 \

    --log-level INFO
Misc

As a user, I would have some potential suggestions for improvement, but we can put them in another PR

Lack of vim?

If I want to modify sth quickly in Docker, it would be convenient to have sth like Vim.

Set a workspace in Docker?

In vLLM and SGlang Docker, when starting and entering their container, the default directory would be a workspace (e.g., SGLang).

When entering the current LMDeploy Docker container, the default dir is /. If we want to perform benchmarking
python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
benchmark/profile_restful_api.py will not be available readily. It would be better if we had a default LMDeploy workspace containing the codes, then the above benchmarking code can be executed without further adjustment.

Thank you for your thorough and detailed review! Here is my follow-ups:

I will try to fix the DeepGEMM commit with the old API and verify it again.
For dev&debug experience, I think I am facing a dilemma that we can not get both the minimal size and better dev&debug experience at the same time. So I am trying to split images into two types: one for runtime and the one for dev.

But I still want to know:

For Dev Image, what do you expect in it? Vim for sure, others?
Do you expect a fixed version of LMDeploy inside it? Or you prefer to volume the source from outside?

windreamer · 2025-08-11T13:50:47Z

@CUHKSZzxy I think now you can use this command line to get a dev image (currently code workspace needs to be mounted from outside. )

docker build . -f docker/Dockerfile --build-arg IMAGE_TYPE=dev

better performance

…ker image

windreamer · 2025-08-12T06:35:04Z

As #3827 merged, I have rebased this PR and reverted commit with fixed DeepGEMM commit.

docker/prepare_wheel.sh

CUHKSZzxy · 2025-08-13T10:47:33Z

Thanks for the timely fix, overall LGTM!

Some replies

For Dev Image, what do you expect in it? Vim for sure, others?

Dev image seems a good choice to balance image size and convenience. As for what's inside the dev image, I think it depends on personal preference? Maybe we can land this PR first, and improve the dev image when developing codes.

Do you expect a fixed version of LMDeploy inside it? Or you prefer to volume the source from outside?

For non-Dev image, i would expect a fixed version of LMDepoloy, just like previous lmdeploy docker images.
For Dev image, i think it's more common to mount source from outside

windreamer had a problem deploying to prod July 26, 2025 15:42 — with GitHub Actions Error

windreamer force-pushed the optimize_docker branch 12 times, most recently from 62287fa to df0ee5b Compare July 28, 2025 02:06

windreamer requested review from CUHKSZzxy and lvhan028 July 28, 2025 04:11

lvhan028 reviewed Jul 29, 2025

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

CUHKSZzxy reviewed Jul 29, 2025

View reviewed changes

docker/prepare_wheel.sh Outdated Show resolved Hide resolved

lvhan028 added the improvement label Jul 30, 2025

windreamer force-pushed the optimize_docker branch 3 times, most recently from e27981b to c19ccc3 Compare July 31, 2025 00:39

lvhan028 reviewed Aug 5, 2025

View reviewed changes

docker/build.sh Outdated Show resolved Hide resolved

lvhan028 reviewed Aug 5, 2025

View reviewed changes

docker/build.sh Outdated Show resolved Hide resolved

windreamer force-pushed the optimize_docker branch 4 times, most recently from c59b58e to 441216a Compare August 5, 2025 09:52

lvhan028 approved these changes Aug 8, 2025

View reviewed changes

lvhan028 self-requested a review August 8, 2025 06:37

lvhan028 approved these changes Aug 8, 2025

View reviewed changes

windreamer force-pushed the optimize_docker branch 2 times, most recently from f9a56e7 to 49004a9 Compare August 12, 2025 02:10

windreamer added 10 commits August 12, 2025 14:32

build(docker): use multi stage docker build to optimize image size

fa6bfdf

build(docker): add cuda 12.8 with Blackwell support (sm120)

562b02c

build(docker): merge hopper image

bc9693e

build(docker): support python3.13 and also install flash-attention for

d48ba2d

better performance

build(docker): use prebuilt NVSHMEM instead

9f0f16d

build(cmake): prefer nvidia nccl prebuilt wheel package

d98bdca

build(docker): make DeepEP available in Python 3.9

4e55539

build(docker): use cache mount to reduce image size

6e03b05

build(docker): optimize Ascend image with prebuilt openEuler CANN doc…

daa635f

…ker image

build(docker): build dev image and publish images via actions

138f904

windreamer force-pushed the optimize_docker branch from 49004a9 to 138f904 Compare August 12, 2025 06:32

CUHKSZzxy reviewed Aug 13, 2025

View reviewed changes

docker/prepare_wheel.sh Outdated Show resolved Hide resolved

build(docker): remove gdrcopy to make build simpler

b822524

windreamer force-pushed the optimize_docker branch from dd64f83 to b822524 Compare August 13, 2025 11:00

CUHKSZzxy approved these changes Aug 13, 2025

View reviewed changes

lvhan028 merged commit fbdd668 into InternLM:main Aug 13, 2025
25 checks passed

windreamer deleted the optimize_docker branch August 13, 2025 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(docker): Try to optimize docker#3779

build(docker): Try to optimize docker#3779
lvhan028 merged 11 commits intoInternLM:mainfrom
windreamer:optimize_docker

windreamer commented Jul 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

windreamer commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

windreamer commented Aug 5, 2025

Uh oh!

lvhan028 commented Aug 8, 2025 •

edited

Loading

Uh oh!

windreamer commented Aug 8, 2025

Uh oh!

CUHKSZzxy commented Aug 11, 2025 •

edited

Loading

Uh oh!

windreamer commented Aug 11, 2025 •

edited

Loading

Uh oh!

windreamer commented Aug 11, 2025

Uh oh!

windreamer commented Aug 12, 2025

Uh oh!

Uh oh!

CUHKSZzxy commented Aug 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

windreamer commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

Multi-stage build:

CUDA 12.8.1 upgrade:

Result

Checklist

Uh oh!

Uh oh!

Uh oh!

windreamer commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

windreamer commented Aug 5, 2025

Uh oh!

lvhan028 commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

windreamer commented Aug 8, 2025

Uh oh!

CUHKSZzxy commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

windreamer commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

windreamer commented Aug 11, 2025

Uh oh!

windreamer commented Aug 12, 2025

Uh oh!

Uh oh!

CUHKSZzxy commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

windreamer commented Jul 26, 2025 •

edited

Loading

lvhan028 commented Aug 8, 2025 •

edited

Loading

CUHKSZzxy commented Aug 11, 2025 •

edited

Loading

windreamer commented Aug 11, 2025 •

edited

Loading

CUHKSZzxy commented Aug 13, 2025 •

edited

Loading