Skip to content

feat!: align runtime API and add runtime dispatch#11

Open
voltjia wants to merge 15 commits into
masterfrom
fix/runtime-api-alignment
Open

feat!: align runtime API and add runtime dispatch#11
voltjia wants to merge 15 commits into
masterfrom
fix/runtime-api-alignment

Conversation

@voltjia

@voltjia voltjia commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Move CUDA Runtime API-shaped public wrappers under infini::rt::runtime, keeping infini::rt free for InfiniRT-specific APIs.
  • Add top-level infini::rt::set_runtime_device_type and infini::rt::runtime_device_type for runtime dispatch between enabled backends.
  • Normalize runtime memcpy-kind constants to kMemcpy... names and remove the non-k aliases from the public API surface.
  • Keep the generated runtime API declaration list in scripts/generate_public_headers.py for now, instead of parsing CUDA docs or headers.
  • Constrain TensorView's tensor-like constructor and add focused test_tensor_view coverage.
  • Update README examples for the new infini::rt::runtime namespace and runtime-device selector APIs.

Motivation

The latest interface guide requires the lowest-level C++ runtime API to match CUDA Runtime API signatures after only Google C++ Style naming conversion. Keeping those CUDA-shaped APIs under infini::rt::runtime makes that contract explicit, while the outer infini::rt namespace can host InfiniRT-specific dispatch APIs without worrying about current or future CUDA symbol conflicts.

Companion InfiniOps PR: InfiniTensor/InfiniOps#787

Type of Change

  • feat - new feature / new operator / new platform
  • fix - bug fix
  • perf - performance improvement (no behavioral change)
  • refactor - code restructuring without behavior change
  • test - adding or fixing tests only
  • docs - documentation only
  • build / ci - build system or CI configuration
  • chore - tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Smoke Test Result

# Host validation on ssh nvidia, outside Docker
export PATH=/usr/local/cuda/bin:$HOME/.local/bin:$PATH
cmake -S /tmp/infinirt-host-bab8/src -B /tmp/infinirt-host-bab8/build \
  -DWITH_CPU=ON -DWITH_NVIDIA=ON -DINFINI_RT_BUILD_TESTING=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=/tmp/infinirt-host-bab8/prefix
cmake --build /tmp/infinirt-host-bab8/build -j8
ctest --test-dir /tmp/infinirt-host-bab8/build --output-on-failure
cmake --install /tmp/infinirt-host-bab8/build

100% tests passed, 0 tests failed out of 8
# accelerator-dev/nvidia:latest, with companion InfiniOps branch
InfiniOps no-Torch InfiniLM operator subset wheel: built and installed successfully.
InfiniOps representative PyTorch wrapper smoke (`abs,clamp,exp`): built and installed successfully.

Test Results on Supported Platforms

Platform Affected Build / Smoke Result Full Result / Notes
NVIDIA yes smoke passed Host CPU+NVIDIA CMake build, CTest 8/8, install, and install-consumer passed. Companion InfiniOps no-Torch and representative PyTorch smoke builds passed in accelerator-dev/nvidia:latest.
Iluvatar yes N/A Not tested: no Iluvatar hardware/toolkit in the validation environment.
MetaX yes N/A Not tested: no MetaX hardware/toolkit in the validation environment.
Cambricon yes N/A Not tested: no Cambricon hardware/toolkit in the validation environment.
Moore yes N/A Not tested: no Moore hardware/toolkit in the validation environment.
Ascend yes N/A Not tested: no Ascend hardware/toolkit in the validation environment.
Full `pytest` output (optional)
N/A

Benchmark / Performance Impact

N/A. This PR changes runtime API shape and dispatch plumbing; no performance-sensitive kernels are changed.

Notes for Reviewers

  • The lowest-level CUDA-shaped APIs are now infini::rt::runtime::*; the outer infini::rt namespace currently only adds runtime-device selector APIs.
  • The generated public header declarations are still driven by a short explicit list in the generator script. Parsing CUDA docs or cuda_runtime.h is intentionally left out of this PR to keep scope contained.
  • Full InfiniCore/InfiniLM inference validation was attempted, but the remote environment repeatedly failed while downloading xmake's Boost dependency from GitHub. The failure occurred before InfiniCore compilation and appears network-related. The relevant InfiniRT and InfiniOps compatibility checks listed above passed.

BREAKING CHANGE: CUDA-shaped runtime APIs move to infini::rt::runtime, and non-k runtime memcpy-kind aliases are removed from the normalized runtime API surface.

@voltjia voltjia changed the title feat!: align runtime API and add default dispatch feat!: align runtime API and add runtime dispatch Jul 2, 2026
@voltjia voltjia marked this pull request as ready for review July 2, 2026 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant