feat: run the OpenXLA backend on CUDA (GB10) via a source-built IREE runtime (#449) by inureyes · Pull Request #461 · lablup/mlxcel

inureyes · 2026-06-27T15:31:19Z

Summary

The OpenXLA backend (issue #449) now runs on the GB10 GPU. MLXCEL_BACKEND=xla MLXCEL_XLA_DEVICE=cuda mlxcel generate drives Llama-3.2-1B through IREE on CUDA, token-exact (48/48) vs HF temp-0 at ~5 tok/s (~2.6x the CPU path).

Why a source build

The prebuilt IREE dist is CPU/Vulkan only: no cuda driver, and its iree-compile has no cuda codegen.
Vulkan via the dist does not work on the GB10: IREE's Vulkan allocator cannot allocate against NVIDIA's Grace-Blackwell unified memory.
So CUDA is the GPU path, using a source-built cuda-enabled IREE runtime plus a cuda-capable iree-compile, version-matched. It is a separate, mutually-exclusive build mode keyed on IREE_CUDA_HOME, so the merged CPU (IREE_DIST) path and CI (which build neither) are unchanged.

What changed

mlxcel-xla/build.rs: cuda mode (IREE_CUDA_HOME) compiles the shim against the source runtime headers with XLA_GATE_CUDA and bakes the cuda iree-compile path (IREE_CUDA_COMPILE, overridable at runtime via MLXCEL_XLA_IREE_COMPILE).
csrc/xla_iree.c: registers the cuda driver explicitly (guarded by XLA_GATE_CUDA; the unified runtime bundles only the local-task registration).
root build.rs: the cuda runtime link recipe (the source unified archive already bundles the cuda driver impl, so whole-archive it alone + the registration wrapper + IREE's vendored printf + flatcc; the cuda driver dlopens libcuda, so no link-time -lcuda).
src/iree.rs: cuda target flags, cuda iree-compile sourcing, and the compiler path in the vmfb cache key.
runtime.rs: suppress the MLX CPU-fallback footgun warning when the XLA backend is selected (it drives inference through IREE, not MLX, so the message is misleading).
spike/iree-ffi: the same cuda mode that proved the path token-exact before it was productized.
README: the source-runtime build recipe.

How to run (local)

# build the cuda-enabled IREE runtime from source (runtime only, no LLVM) — see the crate README
export IREE_CUDA_HOME=/path/to/iree   IREE_CUDA_COMPILE=/path/to/cuda/iree-compile
cargo build --release --features xla-iree
MLXCEL_BACKEND=xla MLXCEL_XLA_DEVICE=cuda ./target/release/mlxcel generate -m <llama-3.2-1b> -p "..." -n 48

Scope

Still Llama-3.2-1B, prompts capped at the 256-token bucket, greedy, single-sequence (batch-1). Throughput needs batched graphs + a multi-sequence session, a follow-up.
The cuda runtime build is a local artifact (not committed); the recipe is in the README.

Validation

dist (CPU) mode still builds (no regression); cuda mode builds, links, and runs end to end on the GB10 at ~5 tok/s.
default and xla-backend builds, cargo fmt --check, and cargo clippy (default -D warnings + xla-backend) are clean; cargo test --features xla-backend --lib backend:: 6/6.

Refs #449.

…runtime (#449) The OpenXLA backend (issue #449) now runs on the GB10 GPU. `MLXCEL_BACKEND=xla MLXCEL_XLA_DEVICE=cuda mlxcel generate` drives Llama-3.2-1B through IREE on CUDA, token-exact (48/48) vs HF temp-0 at about 5 tok/s, roughly 2.6x the CPU path. The prebuilt IREE dist is CPU/Vulkan only (no cuda driver, and its iree-compile has no cuda codegen), and Vulkan through that dist does not work on the GB10 (IREE's Vulkan allocator cannot allocate against NVIDIA's Grace-Blackwell unified memory). CUDA is therefore the GPU path, using a source-built cuda-enabled IREE runtime plus a cuda-capable iree-compile, version-matched to each other. This is a separate, mutually-exclusive build mode keyed on IREE_CUDA_HOME, so the merged CPU (IREE_DIST) path is unchanged and CI, which builds neither, stays green. mlxcel-xla/build.rs gains a cuda mode (IREE_CUDA_HOME): it compiles the shim against the source runtime headers with XLA_GATE_CUDA defined and bakes the cuda iree-compile path (IREE_CUDA_COMPILE, overridable at runtime via MLXCEL_XLA_IREE_COMPILE). The C shim registers the cuda driver explicitly, since the unified runtime bundles only the local-task registration. The runtime link recipe lives in the root build.rs (a dependency's link-args do not propagate to the binary): the source-built unified archive already bundles the cuda driver impl, so it is whole-archived alone, with the cuda registration wrapper, IREE's vendored printf (the unified printf.c.o needs vsnprintf_), and flatcc in a group. The cuda driver dlopens libcuda at runtime, so no link-time -lcuda. src/iree.rs adds the cuda target flags, sources iree-compile from MLXCEL_XLA_IREE_COMPILE in cuda mode, and includes the compiler path in the vmfb cache key so a cuda vmfb is never reused for a cpu build. The source-runtime build recipe is documented in the crate README. The MLX CPU-fallback footgun warning is suppressed when the OpenXLA backend is selected, because it drives inference through IREE, not MLX, and can run on the GPU; the message would otherwise be misleading. spike/iree-ffi gains the same cuda mode (IREE_CUDA_HOME) that proved the path token-exact before it was productized. Validation: the dist (CPU) mode still builds (no regression); the cuda mode builds, links, and runs end to end on the GB10 at about 5 tok/s; default and xla-backend builds, fmt, and clippy are clean; backend tests pass. Refs #449.

inureyes added type:enhancement New features, capabilities, or significant additions priority:medium Medium priority area:architecture Architecture and code structure changes labels Jun 27, 2026

inureyes merged commit a942cb1 into main Jun 27, 2026
5 checks passed

inureyes deleted the feat/449-cuda-productize branch June 27, 2026 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: run the OpenXLA backend on CUDA (GB10) via a source-built IREE runtime (#449)#461

feat: run the OpenXLA backend on CUDA (GB10) via a source-built IREE runtime (#449)#461
inureyes merged 1 commit into
mainfrom
feat/449-cuda-productize

inureyes commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

inureyes commented Jun 27, 2026

Summary

Why a source build

What changed

How to run (local)

Scope

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant