Skip to content

Error in launch using a docker image #4242

@hzhaoy

Description

@hzhaoy

Reminder

  • I have read the README and searched the existing issues.

System Info

System: Ubuntu 20.04.2 LTS
GPU: NVIDIA A100-SXM4-80GB
Docker: 24.0.0
Docker Compose: v2.17.3
llamafactory: 0.8.2.dev0

Reproduction

Dockerfile: https://github.com/hiyouga/LLaMA-Factory/blob/557891debb8a64b73eea012f99780a7b76424cd5/Dockerfile

Build Command:

docker build -f ./Dockerfile \
    --build-arg INSTALL_BNB=true \
    --build-arg INSTALL_VLLM=true \
    --build-arg INSTALL_DEEPSPEED=true \
    --build-arg PIP_INDEX=https://pypi.tuna.tsinghua.edu.cn/simple \
    -t llamafactory:latest .

docker-compose.yml

name: llm-fct

services:
  webui:
    image: llamafactory:latest
    command: ["llamafactory-cli", "webui"]
    volumes:
      - /models:/models
      - ./hf_cache:/root/.cache/huggingface/
      - ./data:/app/data
      - ./output:/app/output
    ports:
      - "7860:7860"
      - "8000:8000"
    ipc: host
    security_opt:
      - seccomp:unconfined
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: "all"
            capabilities: [gpu]
    restart: unless-stopped

Startup Command:
docker compose -f docker-compose.yml up -d

Error:
llm-fct-webui-1 | Traceback (most recent call last):
llm-fct-webui-1 | File "/usr/local/bin/llamafactory-cli", line 5, in
llm-fct-webui-1 | from llamafactory.cli import main
llm-fct-webui-1 | File "/app/src/llamafactory/init.py", line 3, in
llm-fct-webui-1 | from .cli import VERSION
llm-fct-webui-1 | File "/app/src/llamafactory/cli.py", line 7, in
llm-fct-webui-1 | from . import launcher
llm-fct-webui-1 | File "/app/src/llamafactory/launcher.py", line 1, in
llm-fct-webui-1 | from llamafactory.train.tuner import run_exp
llm-fct-webui-1 | File "/app/src/llamafactory/train/tuner.py", line 10, in
llm-fct-webui-1 | from ..model import load_model, load_tokenizer
llm-fct-webui-1 | File "/app/src/llamafactory/model/init.py", line 1, in
llm-fct-webui-1 | from .loader import load_config, load_model, load_tokenizer
llm-fct-webui-1 | File "/app/src/llamafactory/model/loader.py", line 13, in
llm-fct-webui-1 | from .patcher import patch_config, patch_model, patch_tokenizer, patch_valuehead_model
llm-fct-webui-1 | File "/app/src/llamafactory/model/patcher.py", line 16, in
llm-fct-webui-1 | from .model_utils.longlora import configure_longlora
llm-fct-webui-1 | File "/app/src/llamafactory/model/model_utils/longlora.py", line 6, in
llm-fct-webui-1 | from transformers.models.llama.modeling_llama import (
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 54, in
llm-fct-webui-1 | from flash_attn import flash_attn_func, flash_attn_varlen_func
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/flash_attn/init.py", line 3, in
llm-fct-webui-1 | from flash_attn.flash_attn_interface import (
llm-fct-webui-1 | File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in
llm-fct-webui-1 | import flash_attn_2_cuda as flash_attn_cuda
llm-fct-webui-1 | ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

Expected behavior

Successfully started

Others

Maybe there are some solutions here oobabooga/text-generation-webui#4182
And I found that everything is fine when using nvcr.io/nvidia/pytorch:24.01-py3 as the base image instead of nvcr.io/nvidia/pytorch:24.02-py3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    solvedThis problem has been already solved

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions