feat: scaffold the OpenXLA backend crate and seam (#449 Phase 3 M1)#458
Merged
Conversation
Lands the default-off mlxcel-xla crate and wires it into the compute-backend seam, the first step of integrating the OpenXLA / StableHLO compiler-family backend behind the #448 inference-session contract. mlxcel-xla provides XlaInferenceSession, which implements the engine-neutral InferenceSession contract (capabilities, prefill, decode_step) and the self-contained greedy drive loop (generate_greedy / generate_streaming_greedy) that a backend owning its KV and sampling on-device uses instead of threading an MLX model. The seam gains a cfg-gated Backend::Xla / Session::Xla behind a new xla-backend feature, with select_backend recognizing MLXCEL_BACKEND=xla, mirroring the experimental scaffold. XlaBackend::load_model errors (the OpenXLA path drives generation through the session, not the MLX load boundary). Graph execution is stubbed: prefill / decode_step return a clear not-wired error, so the drive loop surfaces it rather than panicking. Binding the IREE runtime C API to load the compiled StableHLO vmfb and run prefill / decode_step is the next milestone, along with threading the model directory through session creation. Verified: the default build is unaffected (mlxcel-xla is an optional dep gated off, not compiled), cargo check --features xla-backend builds the lib and tests, fmt and clippy are clean, and the crate plus seam unit tests pass. Refs #449.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 3 M1 (#449): the default-off
mlxcel-xlacrate and the compute-backend seam wiring, the first step of integrating the OpenXLA / StableHLO backend behind the #448 inference-session contract.mlxcel-xlaprovidesXlaInferenceSession: it implements the engine-neutralInferenceSessioncontract (capabilities/prefill/decode_step) plus the self-contained greedy drive loop (generate_greedy/generate_streaming_greedy) a backend that owns its KV and samples on-device uses instead of threading an MLX model.cfg-gatedBackend::Xla/Session::Xlabehind a newxla-backendfeature;select_backendrecognizesMLXCEL_BACKEND=xla, mirroring theexperimentalscaffold.XlaBackend::load_modelerrors (the OpenXLA path drives generation through the session, not the MLX load boundary).prefill/decode_stepreturn a clear not-wired error so the drive loop surfaces it rather than panicking. Binding the IREE runtime C API (load the compiled StableHLO vmfb, run prefill / decode_step) is the next milestone, together with threading the model directory through session creation.Verified: the default build is unaffected (
mlxcel-xlais an optional dep gated off, not compiled),cargo check --features xla-backendbuilds the lib and tests, fmt and clippy are clean, and the crate plus seam unit tests pass.Refs #449.