Skip to content

build(docker): Try to optimize docker#3779

Merged
lvhan028 merged 11 commits intoInternLM:mainfrom
windreamer:optimize_docker
Aug 13, 2025
Merged

build(docker): Try to optimize docker#3779
lvhan028 merged 11 commits intoInternLM:mainfrom
windreamer:optimize_docker

Conversation

@windreamer
Copy link
Copy Markdown
Collaborator

@windreamer windreamer commented Jul 26, 2025

Motivation

This PR aims to improve the efficiency and future compatibility of our Docker environment.

  1. Reduce image size: The original Docker image was approximately 14GB, which caused slower pull times and increased storage usage.
  2. Enable support for Blackwell GPUs: By updating to the CUDA 12.8.1 base image, this PR prepares the environment for the next-generation Blackwell architecture.

Modification

Multi-stage build:

  • Refactored the Dockerfile using a multi-stage build approach.
  • Reduced the final image size from ~14GB to ~8GB without losing functionality.

CUDA 12.8.1 upgrade:

  • Switched the base image to nvidia/cuda:12.8.1 to ensure compatibility with Blackwell GPUs.
  • Adjusted dependencies and build configurations to work seamlessly with the new CUDA version.

Result

Current docker image size

REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
lmdeploy     latest    952bdce6431e   20 seconds ago   7.61GB

Report from dive

Analyzing image...
  efficiency: 96.2069 %
  wastedBytes: 322126726 bytes (322 MB)
  userWastedPercent: 4.2792 %

Checklist

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  • If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  • The documentation has been modified accordingly, like docstring or example tutorials.

@windreamer windreamer force-pushed the optimize_docker branch 3 times, most recently from e27981b to c19ccc3 Compare July 31, 2025 00:39
@windreamer
Copy link
Copy Markdown
Collaborator Author

Temporarily disable DeepEP in Python 3.9 due to this issue: deepseek-ai/DeepEP#350

@windreamer windreamer force-pushed the optimize_docker branch 4 times, most recently from c59b58e to 441216a Compare August 5, 2025 09:52
@windreamer
Copy link
Copy Markdown
Collaborator Author

Temporarily disable DeepEP in Python 3.9 due to this issue: deepseek-ai/DeepEP#350

fixed now

@lvhan028 lvhan028 self-requested a review August 8, 2025 06:37
@lvhan028
Copy link
Copy Markdown
Collaborator

lvhan028 commented Aug 8, 2025

The way I am used to compiling turbomind engine failed since cmake is missed in the docker image.

mkdir build
cd build
../generate.sh make

May pip install cmake in the image

@windreamer
Copy link
Copy Markdown
Collaborator Author

The way I am used to compiling turbomind engine failed since cmake is missed in the docker image.


mkdir build

cd build

../generate.sh make

May pip install cmake in the image

OK, I may leave it as a future work, may I?

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

CUHKSZzxy commented Aug 11, 2025

  1. DeepEP-related functionalities tested using their official testing scripts, looks good

  2. Error occurs when warming up DeepGEMM, due to a mismatch in API.

DeepGEMM has modified the API after

We should merge this PR after the following fix

Trace:

ray.exceptions.RayTaskError(ModuleNotFoundError): ray::RayWorkerWrapper.build_graph_runner() (pid=18354, ip=10.130.8.137, actor_id=e403d329ae8eabda5441dc6801000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7fca38a4a770>)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base_worker.py", line 119, in build_graph_runner
    self.model_agent.build_graph_runner()
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 842, in build_graph_runner
    self.patched_model = backend.build_graph_runner(self.patched_model,
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/op_backend.py", line 195, in build_graph_runner
    get_warmup_manager().warmup(warmup_meta)
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/warmup_manager.py", line 45, in warmup
    func(warmup_meta)
  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/blockedf8_modules.py", line 95, in warmup
    from deep_gemm.jit_kernels.utils import get_m_alignment_for_contiguous_layout
ModuleNotFoundError: No module named 'deep_gemm.jit_kernels'

Re-produce:

# start proxy server
lmdeploy serve proxy --server-name 172.16.4.52 --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO


# node 0
LMDEPLOY_DP_MASTER_ADDR=172.16.4.52 \
LMDEPLOY_DP_MASTER_PORT=29555 \
lmdeploy serve api_server \
    Qwen/Qwen3-235B-A22B-FP8 \
    --backend pytorch \
    --tp 1 \
    --dp 4 \
    --ep 4 \
    --proxy-url http://172.16.4.52:8000 \
    --nnodes 1 \
    --node-rank 0 \
    --log-level INFO
  1. Misc
    As a user, I would have some potential suggestions for improvement, but we can put them in another PR
  • Lack of vim?
    If I want to modify sth quickly in Docker, it would be convenient to have sth like Vim.

  • Set a workspace in Docker?
    In vLLM and SGlang Docker, when starting and entering their container, the default directory would be a workspace (e.g., SGLang).

When entering the current LMDeploy Docker container, the default dir is /. If we want to perform benchmarking

python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json

benchmark/profile_restful_api.py will not be available readily. It would be better if we had a default LMDeploy workspace containing the codes, then the above benchmarking code can be executed without further adjustment.

@windreamer
Copy link
Copy Markdown
Collaborator Author

windreamer commented Aug 11, 2025

  1. DeepEP-related functionalities tested using their official testing scripts, looks good

  2. Error occurs when warming up DeepGEMM, due to a mismatch in API.

DeepGEMM has modified the API after

We should merge this PR after the following fix

Trace:


ray.exceptions.RayTaskError(ModuleNotFoundError): ray::RayWorkerWrapper.build_graph_runner() (pid=18354, ip=10.130.8.137, actor_id=e403d329ae8eabda5441dc6801000000, repr=<lmdeploy.pytorch.engine.executor.ray_executor.RayWorkerWrapper object at 0x7fca38a4a770>)

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result

    return self.__get_result()

  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result

    raise self._exception

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/executor/base_worker.py", line 119, in build_graph_runner

    self.model_agent.build_graph_runner()

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 842, in build_graph_runner

    self.patched_model = backend.build_graph_runner(self.patched_model,

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/op_backend.py", line 195, in build_graph_runner

    get_warmup_manager().warmup(warmup_meta)

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/warmup_manager.py", line 45, in warmup

    func(warmup_meta)

  File "/opt/py3/lib/python3.10/site-packages/lmdeploy/pytorch/backends/cuda/blockedf8_modules.py", line 95, in warmup

    from deep_gemm.jit_kernels.utils import get_m_alignment_for_contiguous_layout

ModuleNotFoundError: No module named 'deep_gemm.jit_kernels'

Re-produce:


# start proxy server

lmdeploy serve proxy --server-name 172.16.4.52 --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO





# node 0

LMDEPLOY_DP_MASTER_ADDR=172.16.4.52 \

LMDEPLOY_DP_MASTER_PORT=29555 \

lmdeploy serve api_server \

    Qwen/Qwen3-235B-A22B-FP8 \

    --backend pytorch \

    --tp 1 \

    --dp 4 \

    --ep 4 \

    --proxy-url http://172.16.4.52:8000 \

    --nnodes 1 \

    --node-rank 0 \

    --log-level INFO

  1. Misc

As a user, I would have some potential suggestions for improvement, but we can put them in another PR

  • Lack of vim?

If I want to modify sth quickly in Docker, it would be convenient to have sth like Vim.

  • Set a workspace in Docker?

In vLLM and SGlang Docker, when starting and entering their container, the default directory would be a workspace (e.g., SGLang).

When entering the current LMDeploy Docker container, the default dir is /. If we want to perform benchmarking


python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json

benchmark/profile_restful_api.py will not be available readily. It would be better if we had a default LMDeploy workspace containing the codes, then the above benchmarking code can be executed without further adjustment.

Thank you for your thorough and detailed review! Here is my follow-ups:

  • I will try to fix the DeepGEMM commit with the old API and verify it again.
  • For dev&debug experience, I think I am facing a dilemma that we can not get both the minimal size and better dev&debug experience at the same time. So I am trying to split images into two types: one for runtime and the one for dev.

But I still want to know:

  • For Dev Image, what do you expect in it? Vim for sure, others?
  • Do you expect a fixed version of LMDeploy inside it? Or you prefer to volume the source from outside?

@windreamer
Copy link
Copy Markdown
Collaborator Author

@CUHKSZzxy I think now you can use this command line to get a dev image (currently code workspace needs to be mounted from outside. )

docker build . -f docker/Dockerfile --build-arg IMAGE_TYPE=dev

@windreamer windreamer force-pushed the optimize_docker branch 2 times, most recently from f9a56e7 to 49004a9 Compare August 12, 2025 02:10
@windreamer
Copy link
Copy Markdown
Collaborator Author

As #3827 merged, I have rebased this PR and reverted commit with fixed DeepGEMM commit.

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

CUHKSZzxy commented Aug 13, 2025

Thanks for the timely fix, overall LGTM!

Some replies

For Dev Image, what do you expect in it? Vim for sure, others?

Dev image seems a good choice to balance image size and convenience. As for what's inside the dev image, I think it depends on personal preference? Maybe we can land this PR first, and improve the dev image when developing codes.

Do you expect a fixed version of LMDeploy inside it? Or you prefer to volume the source from outside?

For non-Dev image, i would expect a fixed version of LMDepoloy, just like previous lmdeploy docker images.
For Dev image, i think it's more common to mount source from outside

@lvhan028 lvhan028 merged commit fbdd668 into InternLM:main Aug 13, 2025
25 checks passed
@windreamer windreamer deleted the optimize_docker branch August 13, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants