Skip to content

ci(package-c): build pytorch plugin without bundled runtime#5706

Draft
njzjz wants to merge 1 commit into
deepmodeling:masterfrom
njzjz:ci/package-c-pytorch-runtime-external
Draft

ci(package-c): build pytorch plugin without bundled runtime#5706
njzjz wants to merge 1 commit into
deepmodeling:masterfrom
njzjz:ci/package-c-pytorch-runtime-external

Conversation

@njzjz

@njzjz njzjz commented Jul 1, 2026

Copy link
Copy Markdown
Member

Summary

  • build the C package in the manylinux_2_28 CUDA 12.9 image and install TensorFlow/PyTorch from dependency groups
  • package the PyTorch backend plugin while excluding libtorch/CUDA runtime libraries from the tarball
  • add package README/download_libtorch.sh guidance and update C package tests for external PyTorch runtime
  • avoid CUDA 12.9 CCCL failures from -arch=all by using all-major as the CUDA 12.9 default

Tests

  • ruff check .
  • ruff format .
  • git diff --check
  • sh -n source/install/package_c.sh
  • bash -n source/install/docker_package_c.sh
  • bash -n source/install/docker_test_package_c.sh
  • fork CI on bbcd58d: Build C library, Build C++, Build/upload to PyPI, CodeQL, Test C++, and Test Python all passed

Fork CI: https://github.com/njzjz/deepmd-kit/actions/runs/28547373402

Summary by CodeRabbit

  • New Features

    • Expanded C library packaging and installation support for PyTorch alongside TensorFlow and JAX.
    • Added packaged runtime guidance so users can download and use the matching PyTorch runtime more easily.
  • Bug Fixes

    • Improved packaged library validation to catch missing shared libraries before release.
    • Reduced the chance of bundling incompatible runtime components.
  • Documentation

    • Updated install instructions with clearer PyTorch compatibility and runtime setup steps.

Copilot AI review requested due to automatic review settings July 1, 2026 22:11
@dosubot dosubot Bot added the build label Jul 1, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds PyTorch backend support to the C library packaging pipeline: CMake filters/bundles runtime dependencies via configurable exclude regexes, packaging and Docker scripts generate a libtorch downloader when PyTorch is enabled, CI validates the packaged tarball and tests PyTorch runtime loading, docs are updated, and CUDA architecture defaults are refined.

Changes

PyTorch runtime packaging

Layer / File(s) Summary
CMake runtime dependency exclusion and install logic
source/api_c/CMakeLists.txt
Adds cached pre/post-exclude regex variables, rebuilds runtime library list with backend plugin targets, injects regexes into file(GET_RUNTIME_DEPENDENCIES), and adjusts install targets for libname, LIB_DEEPMD_OP, and pt/pd backends.
package_c.sh PyTorch-aware build and libtorch downloader
source/install/package_c.sh
Updates path handling, adds Python/CUDA nvrtc detection, parameterizes CMake with PyTorch/exclude-regex options, and generates download_libtorch.sh, README, and libtorch_env.sh when PyTorch is enabled.
Docker packaging container migration
source/install/docker_package_c.sh
Switches from a tensorflow/build image to a configurable manylinux_cuda image, adds Python/PyTorch group environment defaults, and installs dependencies via uv before invoking package_c.sh.
Docker test container PyTorch runtime validation
source/install/docker_test_package_c.sh
Rewritten to use manylinux 2.28, compiles/runs C examples, and conditionally validates PyTorch runtime loading via ldd and ctypes.CDLL when CHECK_PYTORCH_RUNTIME is set.
CI workflow tarball validation and testing
.github/workflows/package_c.yml
Adds matrix exclude regexes, enables PyTorch packaging, verifies tarball contents to prevent bundling PyTorch/CUDA runtime libs, writes a step summary with the download URL, and enables PyTorch runtime test checks.
Installation documentation update
doc/install/install-from-c-library.md
Adds PyTorch to supported backends and documents libtorch version matching, LD_LIBRARY_PATH setup, and use of download_libtorch.sh/libtorch_env.sh.

CUDA architecture default adjustment

Layer / File(s) Summary
CUDA toolkit version-based architecture default
source/lib/src/gpu/CMakeLists.txt
Moves find_package(CUDAToolkit REQUIRED) before architecture defaulting and sets CMAKE_CUDA_ARCHITECTURES to all-major for CUDA toolkit versions 12.9–13.0, otherwise all.

Estimated code review effort: 4 (Complex) | ~60 minutes

Sequence Diagram(s)

sequenceDiagram
    participant Workflow as CI Workflow
    participant DockerTest as docker_test_package_c.sh
    participant Container as manylinux Container
    participant PythonCheck as ctypes.CDLL check
    Workflow->>DockerTest: run with CHECK_PYTORCH_RUNTIME=1
    DockerTest->>Container: extract libdeepmd_c.tar.gz, compile examples
    Container->>Container: install uv, install PyTorch deps
    Container->>Container: set LD_LIBRARY_PATH, run ldd checks
    Container->>PythonCheck: load shared objects via ctypes.CDLL
    PythonCheck-->>DockerTest: pass/fail result
Loading

Suggested labels: enhancement

Suggested reviewers: wanghan-iapcm, iProzd

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: building the package-c PyTorch plugin without bundling its runtime.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/install/package_c.sh`:
- Around line 119-215: The PyTorch runtime URL derivation in package_c.sh can
fail silently when ENABLE_PYTORCH is true but PYTHON_BIN is missing or the torch
version heuristic does not produce a supported URL. Add a clear warning or
diagnostic in the URL-generation block before the README and
download_libtorch.sh generation is skipped, using the existing
PYTORCH_RUNTIME_DOWNLOAD_URL, ENABLE_PYTORCH, and PYTHON_BIN checks to explain
why no PyTorch runtime package was emitted. Keep the warning tied to the same
install-script flow so local/manual runs surface the missing libtorch packaging
immediately.
- Around line 122-138: The libtorch URL generation in the torch version handling
logic is stripping nightly `.devYYYYMMDD` data by splitting on `.dev`, so fix
the version parsing in the package_c script to preserve the full nightly version
string when building the download URL. Update the `version`/`variant` extraction
path around the `torch.__version__` and `torch.version.cuda` handling so nightly
builds keep their `.dev` suffix while still producing the correct `cpu` or `cu*`
libtorch archive name.

In `@source/lib/src/gpu/CMakeLists.txt`:
- Around line 9-17: The CUDA architecture fallback in CMakeLists.txt is too
narrow because the `all-major` branch in the `find_package(CUDAToolkit)` /
`CMAKE_CUDA_ARCHITECTURES` logic stops at CUDA 12.x. Update the version check so
the `all-major` fallback also applies to CUDA 13.x, keeping the existing
`CUDAToolkit_VERSION` guard and the `set(CMAKE_CUDA_ARCHITECTURES ...)` logic in
place while extending the upper bound appropriately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3872c764-c654-40a6-9a12-928f465e34ab

📥 Commits

Reviewing files that changed from the base of the PR and between 42de67e and bbcd58d.

📒 Files selected for processing (7)
  • .github/workflows/package_c.yml
  • doc/install/install-from-c-library.md
  • source/api_c/CMakeLists.txt
  • source/install/docker_package_c.sh
  • source/install/docker_test_package_c.sh
  • source/install/package_c.sh
  • source/lib/src/gpu/CMakeLists.txt

Comment on lines +119 to +215
if [ -z "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ] && [ "${ENABLE_PYTORCH:-FALSE}" = "TRUE" ] && [ -n "${PYTHON_BIN:-}" ]; then
PYTORCH_RUNTIME_DOWNLOAD_URL=$(
"${PYTHON_BIN}" - <<'PY'
from urllib.parse import quote

import torch

version = torch.__version__.split(".dev", 1)[0]
if "+" in version:
variant = version.split("+", 1)[1]
else:
cuda_version = torch.version.cuda
variant = "cu" + cuda_version.replace(".", "") if cuda_version else "cpu"
version = f"{version}+{variant}"

if variant == "cpu" or variant.startswith("cu"):
print(
f"https://download.pytorch.org/libtorch/{variant}/"
f"libtorch-shared-with-deps-{quote(version, safe='')}.zip"
)
PY
)
fi

if [ -n "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ]; then
cat >"${BUILD_TMP_DIR}/libdeepmd_c/README.md" <<EOF
This DeePMD-kit C package was built with PyTorch support, but PyTorch runtime libraries are not bundled.

To use the PyTorch C/C++ backend, install a libtorch runtime that exactly matches the PyTorch version used at build time:

${PYTORCH_RUNTIME_DOWNLOAD_URL}

The PyTorch version must match exactly. The CUDA variant may be omitted only when the target runtime is compatible with the models and hardware you use.
Make the libtorch lib directory discoverable by the dynamic linker, for example by adding it to LD_LIBRARY_PATH.
Run ./download_libtorch.sh from this directory to download and unpack the matching libtorch runtime.
EOF
cat >"${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh" <<'EOF'
#!/bin/sh

set -eu

LIBTORCH_DOWNLOAD_URL="__PYTORCH_RUNTIME_DOWNLOAD_URL__"
SCRIPT_DIR=$(CDPATH= cd -- "$(dirname -- "$0")" && pwd -P)
DEST_DIR=${1:-"${SCRIPT_DIR}"}
ARCHIVE_PATH=${LIBTORCH_ARCHIVE:-"${SCRIPT_DIR}/libtorch.zip"}

mkdir -p "${DEST_DIR}"

if [ -d "${DEST_DIR}/libtorch" ]; then
echo "libtorch already exists at ${DEST_DIR}/libtorch"
else
echo "Downloading ${LIBTORCH_DOWNLOAD_URL}"
if command -v curl >/dev/null 2>&1; then
curl -L --fail --retry 3 -o "${ARCHIVE_PATH}" "${LIBTORCH_DOWNLOAD_URL}"
elif command -v wget >/dev/null 2>&1; then
wget -O "${ARCHIVE_PATH}" "${LIBTORCH_DOWNLOAD_URL}"
else
echo "curl or wget is required to download libtorch." >&2
exit 1
fi

echo "Extracting ${ARCHIVE_PATH} to ${DEST_DIR}"
if command -v unzip >/dev/null 2>&1; then
unzip -q -o "${ARCHIVE_PATH}" -d "${DEST_DIR}"
elif command -v python3 >/dev/null 2>&1; then
ZIP_ARCHIVE="${ARCHIVE_PATH}" ZIP_DEST_DIR="${DEST_DIR}" python3 - <<'PY'
import os
import zipfile

with zipfile.ZipFile(os.environ["ZIP_ARCHIVE"]) as zip_file:
zip_file.extractall(os.environ["ZIP_DEST_DIR"])
PY
elif command -v python >/dev/null 2>&1; then
ZIP_ARCHIVE="${ARCHIVE_PATH}" ZIP_DEST_DIR="${DEST_DIR}" python - <<'PY'
import os
import zipfile

with zipfile.ZipFile(os.environ["ZIP_ARCHIVE"]) as zip_file:
zip_file.extractall(os.environ["ZIP_DEST_DIR"])
PY
else
echo "unzip or python is required to extract libtorch." >&2
exit 1
fi
fi

cat >"${SCRIPT_DIR}/libtorch_env.sh" <<EOF_ENV
export LD_LIBRARY_PATH="${DEST_DIR}/libtorch/lib:\${LD_LIBRARY_PATH:-}"
EOF_ENV

echo "libtorch is available at ${DEST_DIR}/libtorch"
echo "Run this before using the PyTorch C/C++ backend:"
echo " . ${SCRIPT_DIR}/libtorch_env.sh"
EOF
sed -i "s#__PYTORCH_RUNTIME_DOWNLOAD_URL__#${PYTORCH_RUNTIME_DOWNLOAD_URL}#g" "${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh"
chmod +x "${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh"
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Silent skip when PyTorch download URL can't be derived.

If ENABLE_PYTORCH=TRUE but PYTHON_BIN is unset, or the version/variant heuristic yields no URL (e.g., unsupported variant), PYTORCH_RUNTIME_DOWNLOAD_URL stays empty and the README/download_libtorch.sh generation is silently skipped with no diagnostic. CI happens to catch this via the tarball content check, but local/manual invocations get no explanation.

💬 Suggested diagnostic warning
 if [ -z "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ] && [ "${ENABLE_PYTORCH:-FALSE}" = "TRUE" ] && [ -n "${PYTHON_BIN:-}" ]; then
 	PYTORCH_RUNTIME_DOWNLOAD_URL=$(
 		"${PYTHON_BIN}" - <<'PY'
 ...
 PY
 	)
 fi
+
+if [ "${ENABLE_PYTORCH:-FALSE}" = "TRUE" ] && [ -z "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ]; then
+	echo "WARNING: could not derive a PyTorch runtime download URL; README/download_libtorch.sh will not be generated." >&2
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ -z "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ] && [ "${ENABLE_PYTORCH:-FALSE}" = "TRUE" ] && [ -n "${PYTHON_BIN:-}" ]; then
PYTORCH_RUNTIME_DOWNLOAD_URL=$(
"${PYTHON_BIN}" - <<'PY'
from urllib.parse import quote
import torch
version = torch.__version__.split(".dev", 1)[0]
if "+" in version:
variant = version.split("+", 1)[1]
else:
cuda_version = torch.version.cuda
variant = "cu" + cuda_version.replace(".", "") if cuda_version else "cpu"
version = f"{version}+{variant}"
if variant == "cpu" or variant.startswith("cu"):
print(
f"https://download.pytorch.org/libtorch/{variant}/"
f"libtorch-shared-with-deps-{quote(version, safe='')}.zip"
)
PY
)
fi
if [ -n "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ]; then
cat >"${BUILD_TMP_DIR}/libdeepmd_c/README.md" <<EOF
This DeePMD-kit C package was built with PyTorch support, but PyTorch runtime libraries are not bundled.
To use the PyTorch C/C++ backend, install a libtorch runtime that exactly matches the PyTorch version used at build time:
${PYTORCH_RUNTIME_DOWNLOAD_URL}
The PyTorch version must match exactly. The CUDA variant may be omitted only when the target runtime is compatible with the models and hardware you use.
Make the libtorch lib directory discoverable by the dynamic linker, for example by adding it to LD_LIBRARY_PATH.
Run ./download_libtorch.sh from this directory to download and unpack the matching libtorch runtime.
EOF
cat >"${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh" <<'EOF'
#!/bin/sh
set -eu
LIBTORCH_DOWNLOAD_URL="__PYTORCH_RUNTIME_DOWNLOAD_URL__"
SCRIPT_DIR=$(CDPATH= cd -- "$(dirname -- "$0")" && pwd -P)
DEST_DIR=${1:-"${SCRIPT_DIR}"}
ARCHIVE_PATH=${LIBTORCH_ARCHIVE:-"${SCRIPT_DIR}/libtorch.zip"}
mkdir -p "${DEST_DIR}"
if [ -d "${DEST_DIR}/libtorch" ]; then
echo "libtorch already exists at ${DEST_DIR}/libtorch"
else
echo "Downloading ${LIBTORCH_DOWNLOAD_URL}"
if command -v curl >/dev/null 2>&1; then
curl -L --fail --retry 3 -o "${ARCHIVE_PATH}" "${LIBTORCH_DOWNLOAD_URL}"
elif command -v wget >/dev/null 2>&1; then
wget -O "${ARCHIVE_PATH}" "${LIBTORCH_DOWNLOAD_URL}"
else
echo "curl or wget is required to download libtorch." >&2
exit 1
fi
echo "Extracting ${ARCHIVE_PATH} to ${DEST_DIR}"
if command -v unzip >/dev/null 2>&1; then
unzip -q -o "${ARCHIVE_PATH}" -d "${DEST_DIR}"
elif command -v python3 >/dev/null 2>&1; then
ZIP_ARCHIVE="${ARCHIVE_PATH}" ZIP_DEST_DIR="${DEST_DIR}" python3 - <<'PY'
import os
import zipfile
with zipfile.ZipFile(os.environ["ZIP_ARCHIVE"]) as zip_file:
zip_file.extractall(os.environ["ZIP_DEST_DIR"])
PY
elif command -v python >/dev/null 2>&1; then
ZIP_ARCHIVE="${ARCHIVE_PATH}" ZIP_DEST_DIR="${DEST_DIR}" python - <<'PY'
import os
import zipfile
with zipfile.ZipFile(os.environ["ZIP_ARCHIVE"]) as zip_file:
zip_file.extractall(os.environ["ZIP_DEST_DIR"])
PY
else
echo "unzip or python is required to extract libtorch." >&2
exit 1
fi
fi
cat >"${SCRIPT_DIR}/libtorch_env.sh" <<EOF_ENV
export LD_LIBRARY_PATH="${DEST_DIR}/libtorch/lib:\${LD_LIBRARY_PATH:-}"
EOF_ENV
echo "libtorch is available at ${DEST_DIR}/libtorch"
echo "Run this before using the PyTorch C/C++ backend:"
echo " . ${SCRIPT_DIR}/libtorch_env.sh"
EOF
sed -i "s#__PYTORCH_RUNTIME_DOWNLOAD_URL__#${PYTORCH_RUNTIME_DOWNLOAD_URL}#g" "${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh"
chmod +x "${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh"
fi
if [ -z "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ] && [ "${ENABLE_PYTORCH:-FALSE}" = "TRUE" ] && [ -n "${PYTHON_BIN:-}" ]; then
PYTORCH_RUNTIME_DOWNLOAD_URL=$(
"${PYTHON_BIN}" - <<'PY'
from urllib.parse import quote
import torch
version = torch.__version__.split(".dev", 1)[0]
if "+" in version:
variant = version.split("+", 1)[1]
else:
cuda_version = torch.version.cuda
variant = "cu" + cuda_version.replace(".", "") if cuda_version else "cpu"
version = f"{version}+{variant}"
if variant == "cpu" or variant.startswith("cu"):
print(
f"https://download.pytorch.org/libtorch/{variant}/"
f"libtorch-shared-with-deps-{quote(version, safe='')}.zip"
)
PY
)
fi
if [ "${ENABLE_PYTORCH:-FALSE}" = "TRUE" ] && [ -z "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ]; then
echo "WARNING: could not derive a PyTorch runtime download URL; README/download_libtorch.sh will not be generated." >&2
fi
if [ -n "${PYTORCH_RUNTIME_DOWNLOAD_URL:-}" ]; then
cat >"${BUILD_TMP_DIR}/libdeepmd_c/README.md" <<EOF
This DeePMD-kit C package was built with PyTorch support, but PyTorch runtime libraries are not bundled.
To use the PyTorch C/C++ backend, install a libtorch runtime that exactly matches the PyTorch version used at build time:
${PYTORCH_RUNTIME_DOWNLOAD_URL}
The PyTorch version must match exactly. The CUDA variant may be omitted only when the target runtime is compatible with the models and hardware you use.
Make the libtorch lib directory discoverable by the dynamic linker, for example by adding it to LD_LIBRARY_PATH.
Run ./download_libtorch.sh from this directory to download and unpack the matching libtorch runtime.
EOF
cat >"${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh" <<'EOF'
#!/bin/sh
set -eu
LIBTORCH_DOWNLOAD_URL="__PYTORCH_RUNTIME_DOWNLOAD_URL__"
SCRIPT_DIR=$(CDPATH= cd -- "$(dirname -- "$0")" && pwd -P)
DEST_DIR=${1:-"${SCRIPT_DIR}"}
ARCHIVE_PATH=${LIBTORCH_ARCHIVE:-"${SCRIPT_DIR}/libtorch.zip"}
mkdir -p "${DEST_DIR}"
if [ -d "${DEST_DIR}/libtorch" ]; then
echo "libtorch already exists at ${DEST_DIR}/libtorch"
else
echo "Downloading ${LIBTORCH_DOWNLOAD_URL}"
if command -v curl >/dev/null 2>&1; then
curl -L --fail --retry 3 -o "${ARCHIVE_PATH}" "${LIBTORCH_DOWNLOAD_URL}"
elif command -v wget >/dev/null 2>&1; then
wget -O "${ARCHIVE_PATH}" "${LIBTORCH_DOWNLOAD_URL}"
else
echo "curl or wget is required to download libtorch." >&2
exit 1
fi
echo "Extracting ${ARCHIVE_PATH} to ${DEST_DIR}"
if command -v unzip >/dev/null 2>&1; then
unzip -q -o "${ARCHIVE_PATH}" -d "${DEST_DIR}"
elif command -v python3 >/dev/null 2>&1; then
ZIP_ARCHIVE="${ARCHIVE_PATH}" ZIP_DEST_DIR="${DEST_DIR}" python3 - <<'PY'
import os
import zipfile
with zipfile.ZipFile(os.environ["ZIP_ARCHIVE"]) as zip_file:
zip_file.extractall(os.environ["ZIP_DEST_DIR"])
PY
elif command -v python >/dev/null 2>&1; then
ZIP_ARCHIVE="${ARCHIVE_PATH}" ZIP_DEST_DIR="${DEST_DIR}" python - <<'PY'
import os
import zipfile
with zipfile.ZipFile(os.environ["ZIP_ARCHIVE"]) as zip_file:
zip_file.extractall(os.environ["ZIP_DEST_DIR"])
PY
else
echo "unzip or python is required to extract libtorch." >&2
exit 1
fi
fi
cat >"${SCRIPT_DIR}/libtorch_env.sh" <<EOF_ENV
export LD_LIBRARY_PATH="${DEST_DIR}/libtorch/lib:\${LD_LIBRARY_PATH:-}"
EOF_ENV
echo "libtorch is available at ${DEST_DIR}/libtorch"
echo "Run this before using the PyTorch C/C++ backend:"
echo " . ${SCRIPT_DIR}/libtorch_env.sh"
EOF
sed -i "s#__PYTORCH_RUNTIME_DOWNLOAD_URL__#${PYTORCH_RUNTIME_DOWNLOAD_URL}`#g`" "${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh"
chmod +x "${BUILD_TMP_DIR}/libdeepmd_c/download_libtorch.sh"
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/install/package_c.sh` around lines 119 - 215, The PyTorch runtime URL
derivation in package_c.sh can fail silently when ENABLE_PYTORCH is true but
PYTHON_BIN is missing or the torch version heuristic does not produce a
supported URL. Add a clear warning or diagnostic in the URL-generation block
before the README and download_libtorch.sh generation is skipped, using the
existing PYTORCH_RUNTIME_DOWNLOAD_URL, ENABLE_PYTORCH, and PYTHON_BIN checks to
explain why no PyTorch runtime package was emitted. Keep the warning tied to the
same install-script flow so local/manual runs surface the missing libtorch
packaging immediately.

Comment on lines +122 to +138
from urllib.parse import quote

import torch

version = torch.__version__.split(".dev", 1)[0]
if "+" in version:
variant = version.split("+", 1)[1]
else:
cuda_version = torch.version.cuda
variant = "cu" + cuda_version.replace(".", "") if cuda_version else "cpu"
version = f"{version}+{variant}"

if variant == "cpu" or variant.startswith("cu"):
print(
f"https://download.pytorch.org/libtorch/{variant}/"
f"libtorch-shared-with-deps-{quote(version, safe='')}.zip"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

What is the exact filename format for PyTorch nightly libtorch downloads on download.pytorch.org?

💡 Result:

The filename format for LibTorch nightly downloads hosted on download.pytorch.org generally follows a pattern based on the build variant, platform, and version [1][2]. For the latest nightly build, the filename typically uses the suffix -latest.zip [3][4][2]. Common patterns include: - Generic/CPU: libtorch-shared-with-deps-latest.zip [3][2] - CUDA-enabled: libtorch-shared-with-deps-latest.zip (located under the specific cuXXX subdirectory) [4][5][2] - Windows: libtorch-win-shared-with-deps-latest.zip [1] - macOS (ARM64): libtorch-macos-arm64-latest.zip [1] For specific versioned nightly builds, the format incorporates the package version, development date, and sometimes CUDA compute architecture [5][6][7][2]: - Pattern: {prefix}-{version}.dev{date}+{cuda_arch}.zip [5][7] - Example: libtorch-shared-with-deps-2.8.0.dev20250507+cu128.zip [5][7] The {prefix} (e.g., libtorch-shared-with-deps or libtorch-win-shared-with-deps) is determined by the platform and build requirements [1]. Note that as of June 2026, requests for these.zip files on download.pytorch.org issue a 301 redirect to download-r2.pytorch.org, so users should ensure their download tools support following redirects (e.g., using curl -L) [8]. Additionally, the use of the cxx11-abi tag in filenames has been largely deprecated in favor of standard binaries, as modern PyTorch builds have shifted to the C++11 ABI by default [5][7].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the target file around the cited lines.
sed -n '1,220p' source/install/package_c.sh

# Find where this libtorch URL logic is used and whether nightly versions are handled elsewhere.
rg -n "libtorch|\\.dev|torch.__version__|download_libtorch|quote\\(" source/install source || true

Repository: deepmodeling/deepmd-kit

Length of output: 50380


🌐 Web query:

PyTorch libtorch nightly filename format official docs dev date suffix download.pytorch.org libtorch-shared-with-deps-2.8.0.dev20250507+cu128.zip

💡 Result:

The filename format for PyTorch LibTorch nightly builds typically follows a structured pattern used to identify the version, variant, and build date [1][2]. A standard nightly filename generally adheres to the following convention: libtorch-{variant}-{version}.dev{YYYYMMDD}+{cuda_version}.zip [3] Breaking down your example, libtorch-shared-with-deps-2.8.0.dev20250507+cu128.zip: 1. Prefix (libtorch-shared-with-deps): This indicates the LibTorch distribution type. Common variants include shared-with-deps and shared-without-deps [1][4]. 2. Version (2.8.0): The base PyTorch release version under development [3]. 3. Dev Date Suffix (.devYYYYMMDD): Represents the development snapshot date (e.g., dev20250507 corresponds to May 7, 2025) [3]. 4. CUDA Suffix (+cu128): Specifies the CUDA toolkit version (e.g., cu128 for CUDA 12.8) [3]. Key points regarding these files: - Latest Builds: PyTorch provides "latest" symlink versions (e.g., libtorch-shared-with-deps-latest.zip) to always point to the most recent daily build, which is recommended for most users [5][3]. - ABI Conventions: In the past, binaries explicitly tagged with cxx11-abi were common. However, as PyTorch has standardized its C++ ABI, these tags are often omitted in newer nightly builds [3]. - Official Source: All official binaries are hosted at https://download.pytorch.org/libtorch/ [5][6]. Specific nightly builds are categorized by their CUDA or CPU configuration under the /nightly/ directory (e.g., download.pytorch.org/libtorch/nightly/cu128/) [5][4][3]. For the most reliable access, it is recommended to use the "latest" link provided on the official PyTorch Get Started page, as specific versioned filenames are snapshots that change daily [5][3].

Citations:


Preserve the nightly .devYYYYMMDD suffix in the libtorch URL
torch.__version__.split(".dev", 1)[0] turns nightly versions like 2.8.0.dev20250507+cu128 into 2.8.0+cu128, which generates a stable-release filename instead of the nightly archive. source/install/package_c.sh:126-137

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/install/package_c.sh` around lines 122 - 138, The libtorch URL
generation in the torch version handling logic is stripping nightly
`.devYYYYMMDD` data by splitting on `.dev`, so fix the version parsing in the
package_c script to preserve the full nightly version string when building the
download URL. Update the `version`/`variant` extraction path around the
`torch.__version__` and `torch.version.cuda` handling so nightly builds keep
their `.dev` suffix while still producing the correct `cpu` or `cu*` libtorch
archive name.

Comment on lines +9 to +17
find_package(CUDAToolkit REQUIRED)
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES all)
# CUDA 12.9 CCCL fails to compile CUB/Thrust with -arch=all.
if(CUDAToolkit_VERSION VERSION_GREATER_EQUAL "12.9" AND CUDAToolkit_VERSION
VERSION_LESS "13.0")
set(CMAKE_CUDA_ARCHITECTURES all-major)
else()
set(CMAKE_CUDA_ARCHITECTURES all)
endif()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does CUDA 13.0 or later still fail to compile CUB/Thrust with -arch=all (CCCL issue)?

💡 Result:

Yes, compilation issues related to the use of -arch=all with CUDA C++ Core Compute Libraries (CCCL), which includes Thrust and CUB, have been reported in the CUDA 13.x timeframe [1][2][3]. Specifically, issues have stemmed from the preprocessor reaching token limits or encountering macro expansion errors when generating the architecture-specific namespace strings required by CCCL headers when -arch=all is used [1]. For instance, a known bug reported in 2025 involved an error where the _CCCL_PP_SPLICE_WITH_IMPL macro failed because it was passed more arguments than it was designed to handle when processing the extensive list of architectures enabled by -arch=all [1]. While fixes have been implemented for such specific preprocessor limitations (e.g., in CCCL PR #4972), users may continue to encounter various compilation challenges when using -arch=all due to the sheer complexity of the generated headers or other edge-case interactions within the CCCL preprocessor machinery [1][3]. Additionally, users should be aware that the general approach of using -arch=all to create universal binaries has been discouraged or has faced functional limitations in recent CUDA versions, as the resulting binaries may not be truly universal due to PTX versioning, driver compatibility requirements, and the removal of support for older architectures (e.g., Maxwell, Pascal, and Volta support was removed in CUDA 13.0) [4][5]. It is generally recommended to explicitly target the required architectures for a specific deployment environment rather than relying on -arch=all [4].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== File context ==\n'
git ls-files 'source/lib/src/gpu/CMakeLists.txt'
wc -l source/lib/src/gpu/CMakeLists.txt
sed -n '1,80p' source/lib/src/gpu/CMakeLists.txt

printf '\n== Search for CUDA architecture handling ==\n'
rg -n "CMAKE_CUDA_ARCHITECTURES|all-major|all" source/lib/src/gpu source/lib -g '!**/build/**' || true

Repository: deepmodeling/deepmd-kit

Length of output: 50379


🌐 Web query:

CUDA 13.0 release notes support for older architectures removed CUB Thrust -arch=all CCCL preprocessor issue

💡 Result:

In CUDA Toolkit 13.0, NVIDIA removed support for several older GPU architectures and introduced significant changes to the CUDA Core Compute Libraries (CCCL). GPU Architecture Support CUDA 13.0 removed offline compilation and library support for GPU architectures prior to the Turing generation [1][2][3]. This includes Maxwell, Pascal, and Volta architectures (compute capabilities earlier than 7.5) [2][4][3]. These architectures are now considered feature-complete, and developers must use CUDA Toolkit 12.x series to build applications targeting them [1][5]. CCCL and Preprocessor Issues With the introduction of CCCL 3.0 in CUDA 13.0, several changes and known issues occurred: 1. Header Reorganization: CCCL headers were moved to a new top-level directory structure (${CTK_ROOT}/include/cccl/...) to prevent conflicts [6][7]. For projects not using CMake with the CCCL::CCCL target, developers must manually add the new include path to their build systems [6][7]. 2. CCCL Preprocessor Issue with -arch=all: A bug was identified where using nvcc with the -arch=all flag while including CUB or Thrust headers caused compilation errors [8]. The issue stemmed from an overflow in the CCCL preprocessor macro _CCCL_PP_SPLICE_WITH_IMPLx, which could not handle the growing number of GPU architectures [8]. A fix was implemented in the CCCL repository (PR #4972) to increase the argument capacity of this macro [8]. 3. Other Compatibility Issues: Additional issues were reported when compiling with CUDA 13.0 nvcc in host-compiled files (.cpp) when using specific system definitions, such as THRUST_DEVICE_SYSTEM_OMP [9]. These were generally addressed by patching guard clauses in the affected PTX helper headers within the CCCL library [9]. Developers are encouraged to consult the official CCCL 3.0 migration guide for detailed instructions on updating build paths and handling these breaking changes [6][7].

Citations:


🌐 Web query:

CUDA 13.2 release notes CCCL -arch=all compiler flag issue Thrust CUB

💡 Result:

There is no documented "issue" specifically named the "-arch=all compiler flag issue" in the official CUDA 13.2 release notes for CCCL, Thrust, or CUB [1]. However, the use of -arch=all has been associated with preprocessor and macro-related compilation errors in previous versions of the CUDA Core Compute Library (CCCL) [2]. Specifically, users have reported that when compiling with nvcc using -arch=all while including CUB headers (e.g., cub/cub.cuh), the compiler may fail due to limitations in internal CCCL preprocessor macros (such as _CCCL_PP_SPLICE_WITH) that are responsible for handling the increasingly large list of CUDA architectures [2]. If you are encountering compilation errors with -arch=all in CUDA 13.2, they are likely related to: 1. Macro Limitations: The preprocessor may struggle to build the architecture string if the number of supported architectures exceeds the capacity of specific internal macros (a known issue that has been patched in previous CCCL versions) [2]. 2. Environment-Specific Errors: While not specifically tied to -arch=all, users have reported that CUB code (e.g., device_transform.cuh) can encounter issues with specific host compilers (like GCC on ARM/SBSA) in CUDA 13.2 due to non-conforming C++ syntax that nvcc may silently accept but host compilers treat as errors [3]. To resolve or investigate such issues, it is recommended to: - Check the official NVIDIA CCCL GitHub repository for current open issues and discussions, as these are often tracked there rather than in the general CUDA Toolkit release notes [4][2][3]. - If you are building for a specific set of hardware, replace -arch=all with more explicit architecture flags (e.g., -arch=sm_80) to reduce the complexity of the architecture string handled by the preprocessor [5]. - Ensure you are using the latest patch release for the CUDA Toolkit, as patch releases often contain fixes for library-specific compilation issues [1].

Citations:


Extend the all-major fallback to CUDA 13.x

The < 13.0 cutoff is too narrow. CUDA 13.x still has reported CCCL/Thrust/CUB compile failures with -arch=all, so dropping back to all can reintroduce the build breakage this guard is meant to avoid.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/lib/src/gpu/CMakeLists.txt` around lines 9 - 17, The CUDA architecture
fallback in CMakeLists.txt is too narrow because the `all-major` branch in the
`find_package(CUDAToolkit)` / `CMAKE_CUDA_ARCHITECTURES` logic stops at CUDA
12.x. Update the version check so the `all-major` fallback also applies to CUDA
13.x, keeping the existing `CUDAToolkit_VERSION` guard and the
`set(CMAKE_CUDA_ARCHITECTURES ...)` logic in place while extending the upper
bound appropriately.

@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.08%. Comparing base (ee30956) to head (bbcd58d).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5706      +/-   ##
==========================================
- Coverage   81.20%   81.08%   -0.12%     
==========================================
  Files         978      980       +2     
  Lines      108858   109350     +492     
  Branches     4139     4207      +68     
==========================================
+ Hits        88393    88669     +276     
- Misses      18946    19156     +210     
- Partials     1519     1525       +6     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz njzjz marked this pull request as draft July 2, 2026 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants