From fe72c7312a681961545b43aeb9f72b1671ed33e4 Mon Sep 17 00:00:00 2001 From: AlexHe99 Date: Tue, 24 Dec 2024 18:47:05 +0800 Subject: [PATCH 1/5] Update deploying_with_k8s.md with AMD ROCm GPU example Add the example of using AMD ROCm GPU Signed-off-by: Alex He --- docs/source/serving/deploying_with_k8s.md | 73 +++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/docs/source/serving/deploying_with_k8s.md b/docs/source/serving/deploying_with_k8s.md index d27db826cd00..81ffc3e3703a 100644 --- a/docs/source/serving/deploying_with_k8s.md +++ b/docs/source/serving/deploying_with_k8s.md @@ -119,6 +119,79 @@ spec: periodSeconds: 5 ``` +- AMD ROCm GPU + +You can refer to the `deployment.yaml` below if using AMD ROCm GPU like MI300X. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: mistral-7b + namespace: default + labels: + app: mistral-7b +spec: + replicas: 1 + selector: + matchLabels: + app: mistral-7b + template: + metadata: + labels: + app: mistral-7b + spec: + volumes: + # PVC + - name: cache-volume + persistentVolumeClaim: + claimName: mistral-7b + # vLLM needs to access the host's shared memory for tensor parallel inference. + - name: shm + emptyDir: + medium: Memory + sizeLimit: "8Gi" + hostNetwork: true + hostIPC: true + containers: + - name: mistral-7b + image: rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4 + securityContext: + seccompProfile: + type: Unconfined + runAsGroup: 44 + capabilities: + add: + - SYS_PTRACE + command: ["/bin/sh", "-c"] + args: [ + "vllm serve mistralai/Mistral-7B-v0.3 --port 8000 --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024" + ] + env: + - name: HUGGING_FACE_HUB_TOKEN + valueFrom: + secretKeyRef: + name: hf-token-secret + key: token + ports: + - containerPort: 8000 + resources: + limits: + cpu: "10" + memory: 20G + amd.com/gpu: "1" + requests: + cpu: "6" + memory: 6G + amd.com/gpu: "1" + volumeMounts: + - name: cache-volume + mountPath: /root/.cache/huggingface + - name: shm + mountPath: /dev/shm +``` +The full example is at https://github.com/ROCm/k8s-device-plugin/tree/master/example/vllm-serve. + 2. **Create a Kubernetes Service for vLLM** Next, create a Kubernetes Service file to expose the `mistral-7b` deployment: From b8401799c52cd58d367af915b0d5c908b5fffb32 Mon Sep 17 00:00:00 2001 From: AlexHe99 Date: Fri, 27 Dec 2024 10:09:12 +0800 Subject: [PATCH 2/5] Update docs/source/serving/deploying_with_k8s.md Good suggestion! Thank you. Co-authored-by: Cyrus Leung Signed-off-by: Alex He --- docs/source/serving/deploying_with_k8s.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/serving/deploying_with_k8s.md b/docs/source/serving/deploying_with_k8s.md index 81ffc3e3703a..f25e7640472e 100644 --- a/docs/source/serving/deploying_with_k8s.md +++ b/docs/source/serving/deploying_with_k8s.md @@ -190,7 +190,7 @@ spec: - name: shm mountPath: /dev/shm ``` -The full example is at https://github.com/ROCm/k8s-device-plugin/tree/master/example/vllm-serve. +The full example is at . 2. **Create a Kubernetes Service for vLLM** From 5d7a897d1e6b0ecd64ee63f37901d4030bf7f883 Mon Sep 17 00:00:00 2001 From: AlexHe99 Date: Fri, 27 Dec 2024 10:29:14 +0800 Subject: [PATCH 3/5] Update deploying_with_k8s.md - Split it to two sub-section about the writing deployment.yaml for NVIDIA GPU and AMD GPU. Signed-off-by: Alex He --- docs/source/serving/deploying_with_k8s.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/source/serving/deploying_with_k8s.md b/docs/source/serving/deploying_with_k8s.md index f25e7640472e..77f848088ea4 100644 --- a/docs/source/serving/deploying_with_k8s.md +++ b/docs/source/serving/deploying_with_k8s.md @@ -47,7 +47,11 @@ data: token: "REPLACE_WITH_TOKEN" ``` -Create a deployment file for vLLM to run the model server. The following example deploys the `Mistral-7B-Instruct-v0.3` model: +Next to create the deployment file for vLLM to run the model server. The following example deploys the `Mistral-7B-Instruct-v0.3` model. + +Here are two examples for using NVIDIA GPU and AMD GPU. + +- NVIDIA GPU ```yaml apiVersion: apps/v1 @@ -119,7 +123,7 @@ spec: periodSeconds: 5 ``` -- AMD ROCm GPU +- AMD GPU You can refer to the `deployment.yaml` below if using AMD ROCm GPU like MI300X. @@ -190,7 +194,7 @@ spec: - name: shm mountPath: /dev/shm ``` -The full example is at . +You can get the full example with steps and sample yaml files from . 2. **Create a Kubernetes Service for vLLM** From 3fc12d017052cb2f3a4e41f29910457cc08f1736 Mon Sep 17 00:00:00 2001 From: AlexHe99 Date: Fri, 27 Dec 2024 10:30:21 +0800 Subject: [PATCH 4/5] Update deploying_with_k8s.md Signed-off-by: Alex He --- docs/source/serving/deploying_with_k8s.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/serving/deploying_with_k8s.md b/docs/source/serving/deploying_with_k8s.md index 77f848088ea4..47ad926e2080 100644 --- a/docs/source/serving/deploying_with_k8s.md +++ b/docs/source/serving/deploying_with_k8s.md @@ -49,7 +49,7 @@ data: Next to create the deployment file for vLLM to run the model server. The following example deploys the `Mistral-7B-Instruct-v0.3` model. -Here are two examples for using NVIDIA GPU and AMD GPU. +Here are two exampels for using NVIDIA GPU and AMD GPU. - NVIDIA GPU From aabd116b6cd2e3b439c00773fb382f45a375f9db Mon Sep 17 00:00:00 2001 From: AlexHe99 Date: Fri, 27 Dec 2024 10:34:50 +0800 Subject: [PATCH 5/5] Update deploying_with_k8s.md Signed-off-by: Alex He --- docs/source/serving/deploying_with_k8s.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/serving/deploying_with_k8s.md b/docs/source/serving/deploying_with_k8s.md index 47ad926e2080..77f848088ea4 100644 --- a/docs/source/serving/deploying_with_k8s.md +++ b/docs/source/serving/deploying_with_k8s.md @@ -49,7 +49,7 @@ data: Next to create the deployment file for vLLM to run the model server. The following example deploys the `Mistral-7B-Instruct-v0.3` model. -Here are two exampels for using NVIDIA GPU and AMD GPU. +Here are two examples for using NVIDIA GPU and AMD GPU. - NVIDIA GPU