Skip to content

feat: add vLLM ROCm Docker variant for AMD GPUs#917

Merged
ericcurtin merged 1 commit into
docker:mainfrom
dmedovich:feat/vllm-rocm-support
May 16, 2026
Merged

feat: add vLLM ROCm Docker variant for AMD GPUs#917
ericcurtin merged 1 commit into
docker:mainfrom
dmedovich:feat/vllm-rocm-support

Conversation

@dmedovich
Copy link
Copy Markdown
Contributor

@dmedovich dmedovich commented May 15, 2026

Fixes #412

Adds a final-vllm-rocm Docker build target that runs vLLM on AMD GPUs using upstream vLLM (no fork, no third-party wheels), per the direction in the issue.

Approach

Builds upstream vLLM from source on top of rocm/vllm-dev:base — the same base image vLLM's own docker/Dockerfile.rocm uses. That base already ships PyTorch ROCm, Triton ROCm, flash-attention ROCm, and the ROCm SDK pre-compiled by AMD. We just clone vllm@v${VLLM_VERSION} and build the wheel.

The runtime stage mirrors the /opt/vllm-env layout that the existing vLLM backend in pkg/inference/backends/vllm/vllm.go expects (via symlinks, since rocm/vllm-dev:base installs Python deps system-wide).

New targets

  • make docker-build-vllm-rocm — build the image
  • make docker-run-vllm-rocm — build + run

LLAMA_SERVER_VARIANT=rocm is passed so the existing logic restricts DOCKER_BUILD_PLATFORMS to linux/amd64.

Trade-offs to flag

  1. The ROCm image does not include llama.cpp. The CUDA vllm stage builds on top of llamacpp and inherits the llama-server binary. For ROCm, rocm/vllm-dev:base and the llama.cpp ROCm image use incompatible ROCm runtime / PyTorch versions, so vllm-rocm is a vLLM-only artifact. Happy to revisit if you'd prefer a combined shape.
  2. Build is heavy. ~30–60 min (HIP kernel compilation), ~12–15 GB final image — likely needs its own CI treatment.
  3. PYTORCH_ROCM_ARCH defaults to gfx90a;gfx942;gfx1100;gfx1101 (MI200/MI300 + RDNA3 7900/7800) — what upstream vLLM officially supports. Exposed as a Dockerfile ARG, overridable at build time.

ttested

  • go build ./..., go test ./..., go vet, golangci-lint — clean
  • make help shows the new targets
  • ❌ Did not run make docker-build-vllm-rocm end-to-end. I don't have a supported GPU (only an RX 6600 = gfx1032, not in vLLM's official ROCm targets), so a runtime test wouldn't be informative even if the build succeeded.

Would appreciate a sanity check on the architectural choice (vLLM-only ROCm image vs combined). Happy to adjust.

History

Original commit (76db8d3, force-pushed) installed vLLM via pip from wheels.vllm.ai/${VLLM_VERSION}/rocm63, but that URL doesn't exist — vLLM only publishes CUDA wheels there. Rewrote to source-build approach above.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The vLLM ROCm Docker stage duplicates most of the CUDA vLLM setup; consider factoring the common vLLM/uv/venv installation into a shared intermediate stage or script to keep the two variants in sync more easily.
  • In the vLLM ROCm Docker stage, you may want to align version configuration with the rest of the image (e.g., via .versions or shared ARGs) so VLLM_ROCM_VERSION and PYTORCH_ROCM_VERSION don’t silently drift from other ROCm-related components.
  • For the ROCm image’s apt install line, consider adding --no-install-recommends to reduce image size and keep the dependency set tighter, especially since this base is already quite large.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The vLLM ROCm Docker stage duplicates most of the CUDA vLLM setup; consider factoring the common vLLM/uv/venv installation into a shared intermediate stage or script to keep the two variants in sync more easily.
- In the vLLM ROCm Docker stage, you may want to align version configuration with the rest of the image (e.g., via .versions or shared ARGs) so VLLM_ROCM_VERSION and PYTORCH_ROCM_VERSION don’t silently drift from other ROCm-related components.
- For the ROCm image’s apt install line, consider adding --no-install-recommends to reduce image size and keep the dependency set tighter, especially since this base is already quite large.

## Individual Comments

### Comment 1
<location path="Dockerfile" line_range="132-138" />
<code_context>
+
+# Install uv and vLLM with ROCm support as modelrunner user.
+# Uses upstream vLLM ROCm wheels (vLLM gained native ROCm support upstream).
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh \
+    && ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
+    && . /opt/vllm-env/bin/activate \
+    && ~/.local/bin/uv pip install vllm \
+         --extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \
+         --extra-index-url https://download.pytorch.org/whl/${PYTORCH_ROCM_VERSION} \
+         --index-strategy unsafe-best-match
+
+RUN /opt/vllm-env/bin/python3.12 -c "import vllm; print(vllm.__version__)" > /opt/vllm-env/version
</code_context>
<issue_to_address>
**issue (bug_risk):** vLLM version arg is not used to pin the installed package, which could desync from the ROCm wheel index path.

`VLLM_VERSION` is only used in the ROCm wheel index URL, while `uv pip install vllm` pulls the latest version from the indexes. If a newer release appears, `uv` may install a different version than the one implied by the ROCm index path, causing resolution errors or mismatched binaries.

Please pin `vllm` to `VLLM_VERSION` so the installed package matches the ROCm wheel index:

```dockerfile
&& ~/.local/bin/uv pip install "vllm==${VLLM_VERSION}" \
     --extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \
     --extra-index-url https://download.pytorch.org/whl/${PYTORCH_ROCM_VERSION} \
     --index-strategy unsafe-best-match
```
</issue_to_address>

### Comment 2
<location path="Dockerfile" line_range="124" />
<code_context>
+
+USER root
+
+RUN apt update && apt install -y python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/*
+
+RUN mkdir -p /opt/vllm-env && chown -R modelrunner:modelrunner /opt/vllm-env
</code_context>
<issue_to_address>
**suggestion (performance):** Consider using --no-install-recommends to keep the ROCm vLLM image smaller and more deterministic.

You likely only need the explicitly listed packages, so preventing installation of recommended extras will keep this base image lean and more reproducible:

```dockerfile
RUN apt update && apt install -y --no-install-recommends \
    python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential \
    && rm -rf /var/lib/apt/lists/*
```

```suggestion
RUN apt update && apt install -y --no-install-recommends python3.12 python3.12-venv python3.12-dev curl ca-certificates build-essential && rm -rf /var/lib/apt/lists/*
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread Dockerfile Outdated
Comment thread Dockerfile Outdated
@dmedovich
Copy link
Copy Markdown
Contributor Author

@ericcurtin yo! look it when you have a free time.

Thanks!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ROCm (AMD GPU) support for vLLM by adding a new Docker build stage and corresponding Makefile targets. While the infrastructure additions are appropriate, several critical correctness issues were identified in the Dockerfile: the vLLM version is invalid, the ROCm 6.3 specification will cause build failures and runtime library mismatches with the base image, and the wheel index URL is missing a required version prefix.

Comment thread Dockerfile Outdated
# so the resulting image contains the ROCm runtime that vLLM links against.
FROM llamacpp AS vllm-rocm

ARG VLLM_VERSION=0.19.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The version 0.19.1 appears to be invalid for vLLM. Upstream vLLM versions are currently in the 0.7.x range. Using an incorrect version will cause the wheel installation to fail as it won't be found on the index. Please verify the intended version (e.g., 0.7.2).

ARG VLLM_VERSION=0.7.2

Comment thread Dockerfile Outdated
Comment on lines +118 to +119
ARG VLLM_ROCM_VERSION=rocm63
ARG PYTORCH_ROCM_VERSION=rocm6.3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The ROCm versions specified here are problematic for the following reasons:

  1. PyTorch Index 404: PyTorch does not currently host wheels for ROCm 6.3 on their official index. https://download.pytorch.org/whl/rocm6.3 will return a 404 error, causing the build to fail. The latest stable version is rocm6.2.
  2. Base Image Mismatch: The llamacpp base image (resolved from ghcr.io/ggml-org/llama.cpp:server-rocm) is built using ROCm 6.2. Installing ROCm 6.3 wheels on top of a ROCm 6.2 runtime will lead to critical failures at runtime due to incompatible shared library versions (e.g., libamdhip64.so.6).
ARG VLLM_ROCM_VERSION=rocm62
ARG PYTORCH_ROCM_VERSION=rocm6.2

Comment thread Dockerfile Outdated
&& ~/.local/bin/uv venv --python 3.12 /opt/vllm-env \
&& . /opt/vllm-env/bin/activate \
&& ~/.local/bin/uv pip install vllm \
--extra-index-url https://wheels.vllm.ai/${VLLM_VERSION}/${VLLM_ROCM_VERSION} \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The URL path for vLLM wheels on wheels.vllm.ai typically requires a v prefix before the version number (e.g., https://wheels.vllm.ai/v0.7.2/rocm62). Without this prefix, the URL will likely return a 404 error.

         --extra-index-url https://wheels.vllm.ai/v${VLLM_VERSION}/${VLLM_ROCM_VERSION} \

Adds a new Docker build target (final-vllm-rocm) that builds upstream
vLLM from source on top of rocm/vllm-dev:base (AMD's pre-built ROCm dev
image containing PyTorch ROCm, Triton, flash-attention, and the ROCm SDK).

vLLM is checked out at the tagged release matching VLLM_VERSION — no
fork, no custom wheels. The runtime stage mirrors the /opt/vllm-env
layout that the existing vLLM backend expects via symlinks, so no Go
code changes are needed.

Unlike the CUDA vllm variant, the ROCm image does not include llama.cpp:
the base images are incompatible (different ROCm runtime versions), so
the ROCm vllm image is intended as a vLLM-only artifact.

Fixes #412
@dmedovich dmedovich force-pushed the feat/vllm-rocm-support branch from 76db8d3 to c285e47 Compare May 15, 2026 19:29
@ericcurtin
Copy link
Copy Markdown
Contributor

Very nice @dmedovich ... You could do the project a huge favour actually... We are lacking people with AMD GPUs, if you could confirm llama.cpp/vLLM works with AMD GPUs that would be great! We are also supposed to automatically detect when your machine has an AMD GPU and just do the right thing in that case, that's another thing, we don't know if it works...

Merging on green build

@ericcurtin ericcurtin merged commit 2b8e451 into docker:main May 16, 2026
8 checks passed
@dmedovich
Copy link
Copy Markdown
Contributor Author

@ericcurtin
Checked the auto-detection code — it calls docker.Info() via the Moby client
to look up registered runtimes (nvidia/rocm/musa). I'm running Podman, not
Docker, so I can't test that path rn. Worth noting that Podman users would always
get GPUSupportNone regardless of what GPU they have.

Will install Docker a bit later and check all three points, will come back with detailed info on each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vLLM ROCm

2 participants