Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ dist
.idea
.vscode
tmp/
requirements-musa.txt
12 changes: 8 additions & 4 deletions docs/CN/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
$ # 前请确保你的docker设置中已经分配了足够的共享内存,否则可能导致
$ # 服务无法正常启动。
$ # 1.如果是纯文本服务,建议分配2GB以上的共享内存, 如果你的内存充足,建议分配16GB以上的共享内存.
$ # 2.如果是多模态服务,建议分配16GB以上的共享内存,具体可以根据实际情况进行调整.
$ # 2.如果是多模态服务,建议分配16GB以上的共享内存,具体可以根据实际情况进行调整.
$ # 如果你没有足够的共享内存,可以尝试在启动服务的时候调低 --running_max_req_size 参数,这会降低
$ # 服务的并发请求数量,但可以减少共享内存的占用。如果是多模态服务,也可以通过降低 --cache_capacity
$ # 参数来减少共享内存的占用。
Expand All @@ -38,7 +38,7 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
你也可以使用源码手动构建镜像并运行,建议手动构建镜像,因为更新比较频繁:

.. code-block:: console

$ # 进入代码仓库的根目录
$ cd /lightllm
$ # 手动构建镜像, docker 目录下有不同功能场景的镜像构建文件,按需构建。
Expand All @@ -52,7 +52,7 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
或者你也可以直接使用脚本一键启动镜像并且运行:

.. code-block:: console

$ # 查看脚本参数
$ python tools/quick_launch_docker.py --help

Expand Down Expand Up @@ -80,6 +80,10 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
$ # 安装lightllm的依赖 (cuda 12.4)
$ pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu124
$
$ # 安装lightllm的依赖 (摩尔线程 GPU)
$ ./generate_requirements_musa.sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command ./generate_requirements_musa.sh requires the script to have execute permissions. A user cloning the repository might not have this permission set by default, forcing them to run chmod +x first. To provide a smoother experience, you can invoke the script with bash directly, which doesn't require the execute bit.

Suggested change
$ ./generate_requirements_musa.sh
$ bash generate_requirements_musa.sh

$ pip install -r requirements-musa.txt
$
$ # 安装lightllm
$ python setup.py install

Expand All @@ -97,6 +101,6 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
.. code-block:: console

$ pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly --no-deps

具体原因可以参考:`issue <https://github.com/triton-lang/triton/issues/3619>`_ 和 `fix PR <https://github.com/triton-lang/triton/pull/3638>`_

22 changes: 13 additions & 9 deletions docs/EN/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,16 +24,16 @@ The easiest way to install Lightllm is using the official image. You can directl
$ docker pull ghcr.io/modeltc/lightllm:main
$
$ # Run,The current LightLLM service relies heavily on shared memory.
$ # Before starting, please make sure that you have allocated enough shared memory
$ # Before starting, please make sure that you have allocated enough shared memory
$ # in your Docker settings; otherwise, the service may fail to start properly.
$ #
$ # 1. For text-only services, it is recommended to allocate more than 2GB of shared memory.
$ # 1. For text-only services, it is recommended to allocate more than 2GB of shared memory.
$ # If your system has sufficient RAM, allocating 16GB or more is recommended.
$ # 2.For multimodal services, it is recommended to allocate 16GB or more of shared memory.
$ # 2.For multimodal services, it is recommended to allocate 16GB or more of shared memory.
$ # You can adjust this value according to your specific requirements.
$ #
$ # If you do not have enough shared memory available, you can try lowering
$ # the --running_max_req_size parameter when starting the service.
$ # If you do not have enough shared memory available, you can try lowering
$ # the --running_max_req_size parameter when starting the service.
$ # This will reduce the number of concurrent requests, but also decrease shared memory usage.
$ docker run -it --gpus all -p 8080:8080 \
$ --shm-size 2g -v your_local_path:/data/ \
Expand All @@ -42,21 +42,21 @@ The easiest way to install Lightllm is using the official image. You can directl
You can also manually build the image from source and run it:

.. code-block:: console

$ # move into lightllm root dir
$ cd /lightllm
$ # Manually build the image
$ docker build -t <image_name> -f ./docker/Dockerfile .
$
$ # Run,
$ # Run,
$ docker run -it --gpus all -p 8080:8080 \
$ --shm-size 2g -v your_local_path:/data/ \
$ <image_name> /bin/bash

Or you can directly use the script to launch the image and run it with one click:

.. code-block:: console

$ # View script parameters
$ python tools/quick_launch_docker.py --help

Expand Down Expand Up @@ -84,6 +84,10 @@ You can also install Lightllm from source:
$ # Install Lightllm dependencies (cuda 12.4)
$ pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu124
$
$ # Install Lightllm dependencies (Moore Threads GPU)
$ ./generate_requirements_musa.sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command ./generate_requirements_musa.sh requires the script to have execute permissions. A user cloning the repository might not have this permission set by default, forcing them to run chmod +x first. To provide a smoother experience, you can invoke the script with bash directly, which doesn't require the execute bit.

Suggested change
$ ./generate_requirements_musa.sh
$ bash generate_requirements_musa.sh

$ pip install -r requirements-musa.txt
$
$ # Install Lightllm
$ python setup.py install

Expand All @@ -101,5 +105,5 @@ You can also install Lightllm from source:
.. code-block:: console

$ pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly --no-deps

For specific reasons, please refer to: `issue <https://github.com/triton-lang/triton/issues/3619>`_ and `fix PR <https://github.com/triton-lang/triton/pull/3638>`_
105 changes: 105 additions & 0 deletions generate_requirements_musa.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
#!/bin/bash
# Script to generate requirements-musa.txt from requirements.txt
# MUSA is not compatible with CUDA packages, so they need to be removed
# Torch-related packages are pre-installed in the MUSA docker container

set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
INPUT_FILE="${SCRIPT_DIR}/requirements.txt"
OUTPUT_FILE="${SCRIPT_DIR}/requirements-musa.txt"

if [ ! -f "$INPUT_FILE" ]; then
echo "Error: requirements.txt not found at $INPUT_FILE"
exit 1
fi

echo "Generating requirements-musa.txt from requirements.txt..."

# Define patterns to remove (CUDA-specific packages)
# These packages are not compatible with MUSA
CUDA_PACKAGES=(
"^cupy" # cupy-cuda12x and similar
"^cuda_bindings" # CUDA bindings
"^nixl" # NIXL (NVIDIA Inter-node eXchange Library)
"^flashinfer" # flashinfer-python (CUDA-specific attention kernel)
"^sgl-kernel" # SGL kernel (CUDA-specific)
)

# Define torch-related packages (pre-installed in MUSA container, remove version pins)
TORCH_PACKAGES=(
"^torch=="
"^torchvision=="
)

# Create the output file with a header comment
cat > "$OUTPUT_FILE" << 'EOF'
# Requirements for MUSA (Moore Threads GPU)
# Auto-generated from requirements.txt by generate_requirements_musa.sh
# CUDA-specific packages have been removed
# Torch-related packages have version pins removed (pre-installed in MUSA container)

EOF

# Process the requirements file
while IFS= read -r line || [ -n "$line" ]; do
# Skip empty lines and comments (but keep them in output)
if [[ -z "$line" || "$line" =~ ^[[:space:]]*# ]]; then
echo "$line" >> "$OUTPUT_FILE"
continue
fi

# Extract package name (before ==, >=, <=, ~=, etc.)
pkg_name=$(echo "$line" | sed -E 's/^([a-zA-Z0-9_-]+).*/\1/')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The regular expression used to extract the package name is not robust. It doesn't account for dots (.) in package names (e.g., ruamel.yaml), which will lead to incorrect package name extraction. This may cause the script to fail to filter packages as intended if a package with a dot in its name needs to be filtered. A more reliable method is to remove the version specifier and everything that follows it.

Suggested change
pkg_name=$(echo "$line" | sed -E 's/^([a-zA-Z0-9_-]+).*/\1/')
pkg_name=$(echo "$line" | sed -E 's/[<>=!~].*//')


# Check if this is a CUDA package to skip
skip=false
for pattern in "${CUDA_PACKAGES[@]}"; do
if [[ "$pkg_name" =~ $pattern ]]; then
echo " Removing CUDA package: $line"
skip=true
break
fi
done

if $skip; then
continue
fi

# Check if this is a torch-related package (remove version pin)
for pattern in "${TORCH_PACKAGES[@]}"; do
if [[ "$line" =~ $pattern ]]; then
# Remove version pin, keep just the package name
pkg_only=$(echo "$line" | sed -E 's/==.*//')
echo " Unpinning version for: $pkg_only (pre-installed in MUSA container)"
echo "$pkg_only" >> "$OUTPUT_FILE"
skip=true
break
fi
done

if $skip; then
continue
fi

# Keep the package as-is
echo "$line" >> "$OUTPUT_FILE"

done < "$INPUT_FILE"
Comment on lines +45 to +88
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Throughout this loop, echo is used to print variables to files and standard output (e.g., lines 48, 59, 74, 75, 86). This can be unsafe if the variable content starts with a hyphen or contains backslashes, as echo might interpret them as options. It's a best practice in shell scripting to use printf for printing variable data to avoid unexpected behavior.

For example:

  • line 48 could be printf "%s\n" "$line" >> "$OUTPUT_FILE"
  • line 59 could be printf " Removing CUDA package: %s\n" "$line"
  • line 74 could be printf " Unpinning version for: %s (pre-installed in MUSA container)\n" "$pkg_only"


# Add MUSA-specific packages at the end
cat >> "$OUTPUT_FILE" << 'EOF'

# MUSA-specific packages
torch_musa
torchada
EOF

echo ""
echo "Successfully generated: $OUTPUT_FILE"
echo ""
echo "Summary of changes:"
echo " - Removed CUDA-specific packages: cupy-cuda12x, cuda_bindings, nixl, flashinfer-python, sgl-kernel"
echo " - Unpinned torch-related packages: torch, torchvision (pre-installed in MUSA container)"
echo " - Added MUSA-specific packages: torch_musa, torchada"