Skip to content

feat: scaffold the OpenXLA backend crate and seam (#449 Phase 3 M1)#458

Merged
inureyes merged 1 commit into
mainfrom
feat/openxla-backend-scaffold-449
Jun 27, 2026
Merged

feat: scaffold the OpenXLA backend crate and seam (#449 Phase 3 M1)#458
inureyes merged 1 commit into
mainfrom
feat/openxla-backend-scaffold-449

Conversation

@inureyes

Copy link
Copy Markdown
Member

Phase 3 M1 (#449): the default-off mlxcel-xla crate and the compute-backend seam wiring, the first step of integrating the OpenXLA / StableHLO backend behind the #448 inference-session contract.

  • New crate mlxcel-xla provides XlaInferenceSession: it implements the engine-neutral InferenceSession contract (capabilities / prefill / decode_step) plus the self-contained greedy drive loop (generate_greedy / generate_streaming_greedy) a backend that owns its KV and samples on-device uses instead of threading an MLX model.
  • The seam gains a cfg-gated Backend::Xla / Session::Xla behind a new xla-backend feature; select_backend recognizes MLXCEL_BACKEND=xla, mirroring the experimental scaffold. XlaBackend::load_model errors (the OpenXLA path drives generation through the session, not the MLX load boundary).
  • Execution is stubbed: prefill / decode_step return a clear not-wired error so the drive loop surfaces it rather than panicking. Binding the IREE runtime C API (load the compiled StableHLO vmfb, run prefill / decode_step) is the next milestone, together with threading the model directory through session creation.

Verified: the default build is unaffected (mlxcel-xla is an optional dep gated off, not compiled), cargo check --features xla-backend builds the lib and tests, fmt and clippy are clean, and the crate plus seam unit tests pass.

Refs #449.

Lands the default-off mlxcel-xla crate and wires it into the compute-backend
seam, the first step of integrating the OpenXLA / StableHLO compiler-family
backend behind the #448 inference-session contract.

mlxcel-xla provides XlaInferenceSession, which implements the engine-neutral
InferenceSession contract (capabilities, prefill, decode_step) and the
self-contained greedy drive loop (generate_greedy / generate_streaming_greedy)
that a backend owning its KV and sampling on-device uses instead of threading an
MLX model. The seam gains a cfg-gated Backend::Xla / Session::Xla behind a new
xla-backend feature, with select_backend recognizing MLXCEL_BACKEND=xla,
mirroring the experimental scaffold. XlaBackend::load_model errors (the OpenXLA
path drives generation through the session, not the MLX load boundary).

Graph execution is stubbed: prefill / decode_step return a clear not-wired error,
so the drive loop surfaces it rather than panicking. Binding the IREE runtime C
API to load the compiled StableHLO vmfb and run prefill / decode_step is the next
milestone, along with threading the model directory through session creation.

Verified: the default build is unaffected (mlxcel-xla is an optional dep gated
off, not compiled), cargo check --features xla-backend builds the lib and tests,
fmt and clippy are clean, and the crate plus seam unit tests pass.

Refs #449.
@inureyes inureyes merged commit add7f1b into main Jun 27, 2026
@inureyes inureyes deleted the feat/openxla-backend-scaffold-449 branch June 27, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant