feat: add Phase 1 Python client package over the server by inureyes · Pull Request #411 · lablup/mlxcel

inureyes · 2026-06-23T21:06:24Z

Summary

Phase 1 of Python integration: a pure-Python client package (mlxcel) under a new top-level python/ directory that drives the existing OpenAI-compatible mlxcel serve server. It spawns and supervises a local server process (managed mode) or connects to a running one (connect mode), auto-discovers the served model id, and exposes the raw openai client as an escape hatch for the full API surface. There are zero changes to the Rust inference core; this is Python, a CI workflow, and docs only.

What changed

python/src/mlxcel/_server.py: ManagedServer does binary discovery (binary= / MLXCEL_BIN / PATH), transport selection (Unix domain socket by default on POSIX with a short sun_path under /tmp, TCP elsewhere or when host=/port= is given), subprocess spawn, authoritative /health readiness polling with backoff and child-process liveness checks (so a first-run weight download does not fail spuriously and an early exit raises MlxcelServerError with the captured stderr tail), stderr forwarding to the mlxcel.server logger on a daemon thread, and graceful SIGTERM-then-SIGKILL shutdown with socket cleanup, atexit, and a __del__ finalizer.
python/src/mlxcel/_client.py (sync LLM) and python/src/mlxcel/_async_client.py (AsyncLLM): wrap the OpenAI SDK over a single TCP-or-UDS httpx transport path with explicit timeouts (required when injecting a custom http_client). Shared mode selection, base-URL normalization, and message-type narrowing live in python/src/mlxcel/_common.py. Public methods: generate, stream, chat, chat_stream, models, tokenize, detokenize, plus model and openai_client properties and close(); tokenize/detokenize call the native /tokenize and /detokenize routes (no /v1 prefix) through the underlying httpx client. The model id is discovered once from /v1/models and cached. Mode rules: a model selects managed mode (with socket= as the optional bind path), base_url= or socket= without a model selects connect mode, and a model plus a base_url/transport connect target is an error.
python/src/mlxcel/_sampling.py: maps Python kwargs to OpenAI request fields and routes server-specific knobs (top_k, min_p, repetition_penalty, DRY) and any unknown keys through extra_body, with response_format passthrough; a caller-supplied extra_body= wins on conflict.
python/src/mlxcel/errors.py: MlxcelError, MlxcelServerError (carries the stderr tail), MlxcelTimeoutError. HTTP and API errors propagate as native openai SDK exceptions rather than being hidden.
python/pyproject.toml: hatchling backend, src layout, requires-python >=3.9, deps openai>=1.40 and httpx>=0.27, a dev extra with pytest/ruff/mypy, ruff/mypy config, and the e2e pytest marker. Ships py.typed. Distribution and import name mlxcel, version 0.1.0.
python/tests/: fake_server.py is a stdlib-only HTTP server (binds UDS or TCP, serves /health 503-then-200, /v1/models, /v1/completions and /v1/chat/completions including SSE streaming variants, /tokenize, /detokenize). test_client_mock.py uses httpx.MockTransport (no real server) to assert generate/stream/chat/tokenize behavior, sampling mapping, model auto-discovery, and error mapping. test_lifecycle.py spawns fake_server.py via ManagedServer and exercises discovery, spawn, health-poll, ready, log capture, graceful shutdown, the early-exit failure path, and connect-mode-to-a-running-UDS-server. test_e2e.py is marked @pytest.mark.e2e and skipped unless MLXCEL_BIN is set.
python/examples/: quickstart.py, streaming.py, structured_output.py. python/README.md: install and usage. python/.gitignore: venv, caches, build artifacts.
.github/workflows/python.yml: runs ruff check, ruff format --check, mypy, and pytest (unit + lifecycle; e2e skipped) on ubuntu-latest, triggered only on python/** and the workflow file, independent of the Rust CI.
Docs: new docs/python-client.md (both modes, streaming, chat, structured output, the openai_client escape hatch, async usage, troubleshooting incl. the socket-path-length note), linked from docs/README.md and the root README.md, with an English nav entry in mkdocs.yml. The Korean nav and translation are left to the finalizer.

Test plan

Verified in a clean venv created outside the repo (/tmp/mlxcel-venv) with pip install -e python[dev]. The Rust binary was not built and the e2e test was not run (this host is Linux + CUDA; mlxcel targets Apple Silicon), so the real-binary E2E path stays marked @pytest.mark.e2e and skipped.

ruff check python -> All checks passed
ruff format --check python -> 14 files already formatted
mypy python/src -> Success: no issues found in 7 source files
pytest python/tests -m "not e2e" -> 29 passed, 2 deselected
import mlxcel works; mlxcel.__version__ == "0.1.0"; LLM/AsyncLLM/error types exported
No .rs or Cargo.* files changed

Closes #407

Add a pure-Python client package under python/ that drives the existing OpenAI-compatible mlxcel server. It either spawns and supervises a local `mlxcel serve` process (managed mode) or connects to a running one (connect mode), auto-discovers the served model id from /v1/models, and exposes the raw openai client as an escape hatch. No changes to the Rust inference core. Package contents: - src/mlxcel/_server.py: ManagedServer handles binary discovery (binary= / MLXCEL_BIN / PATH), transport selection (Unix socket default on POSIX with a short sun_path under /tmp, TCP elsewhere or on request), subprocess spawn, /health readiness polling with backoff and child-liveness checks, stderr forwarding to the mlxcel.server logger, and graceful SIGTERM-then-SIGKILL shutdown with socket cleanup, atexit, and a finalizer. - src/mlxcel/_client.py and src/mlxcel/_async_client.py: the synchronous LLM and asynchronous AsyncLLM, each wrapping the OpenAI SDK over a TCP or UDS httpx transport with explicit timeouts. Shared mode-selection, base-URL handling, and message-type narrowing live in src/mlxcel/_common.py. Methods: generate, stream, chat, chat_stream, models, tokenize, detokenize, plus model and openai_client properties and close(). tokenize/detokenize call the native /tokenize and /detokenize routes through the underlying httpx client. - src/mlxcel/_sampling.py: maps Python kwargs to OpenAI request fields and routes server-specific knobs (top_k, min_p, repetition_penalty, DRY) and unknown keys through extra_body, with response_format passthrough. - src/mlxcel/errors.py: MlxcelError, MlxcelServerError (carries stderr tail), MlxcelTimeoutError. HTTP and API errors propagate as native openai exceptions. Tests, CI, and docs: - tests/: stdlib-only fake_server.py (UDS or TCP, /health 503-then-200, canned /v1/* incl. SSE, /tokenize, /detokenize); test_client_mock.py uses httpx.MockTransport; test_lifecycle.py spawns the fake server via ManagedServer; test_e2e.py is marked e2e and skipped unless MLXCEL_BIN is set. - .github/workflows/python.yml runs ruff, ruff format --check, mypy, and pytest (unit + lifecycle) on ubuntu-latest, independent of the Rust CI and triggered only on python/** changes. - docs/python-client.md documents both modes, streaming, chat, structured output, the openai_client escape hatch, async usage, and the socket-path-length note; linked from README.md, docs/README.md, and the mkdocs nav.

inureyes · 2026-06-23T21:32:06Z

Implementation Review Summary

Intent

Phase 1 pure-Python mlxcel client over the existing OpenAI-compatible server: managed (spawn mlxcel serve) and connect modes, UDS/TCP transport, model auto-discovery, openai_client escape hatch. Zero Rust changes.

Findings Addressed (auto-fixed on this branch, not yet committed)

Native /tokenize and /detokenize sent no Authorization header, so they returned 401 whenever api_key was set. Now attach Bearer <key> on the raw httpx requests in both LLM and AsyncLLM, and omit the header entirely when no key is configured (no-auth path preserved). (HIGH)
response_format lived in the shared OpenAI-field list, so generate(..., response_format=...) raised TypeError (the completions.create endpoint rejects it). build_params is now endpoint-aware: chat keeps it top-level, completions route it through extra_body. (HIGH)
The API key was passed to the child via --api-key <secret> on argv (visible in ps / /proc/<pid>/cmdline, and logged at DEBUG). It is now passed through the LLAMA_API_KEY environment variable, and the launch log no longer carries the secret. (HIGH, security)

Remaining Items (report only, no code change)

mkdocs nav adds user-guide/python-client.md, but with docs_dir: docs/en that resolves to docs/en/user-guide/python-client.md, which has no matching source file. The whole docs/en/ tree is untracked in git (every existing nav sibling is too, maintained out-of-band), and there is no docs CI or --strict build, so no automated gate breaks. The tracked GitHub-facing page docs/python-client.md and its README.md / docs/README.md links are correct. Finalizer should sync the page into docs/en/user-guide/ alongside the Korean nav/translation. (LOW)
_server._probe_once catches ConnectError/ConnectTimeout/ReadError but not RemoteProtocolError/ReadTimeout; bounded by the 5s probe timeout and the per-iteration child-liveness check, so impact is minimal. Optional hardening. (LOW)

Verification

All stated requirements implemented (sync + async, all listed methods, sampling + extra_body + response_format, error types, mock/lifecycle/e2e tests, CI workflow, docs, examples, py.typed, .gitignore)
No placeholder/mock/orphaned code; every module imported and wired through __init__
Integrated into the package code flow (clients use _server/_common/_sampling/errors)
Project conventions followed (3.9-compatible typing, Google docstrings, ruff/mypy strict clean)
Existing modules reused (_common shared by both clients; OpenAI SDK + httpx, no reinvention)
No unintended structural changes; zero .rs / Cargo.* changes
Tests pass: ruff check clean, ruff format --check clean, mypy python/src clean, pytest -m "not e2e" 37 passed / 2 deselected (e2e correctly gated)

Fixes are staged on feature/issue-407-python-client and not committed; commit/push left to the maintainer per review policy.

Pass the server API key through the LLAMA_API_KEY environment variable instead of argv so it is not exposed via ps or /proc/<pid>/cmdline, and attach a Bearer header on the native /tokenize and /detokenize posts that bypass the OpenAI SDK auth injection. Route response_format through extra_body for the plain completions endpoint (the SDK rejects it as a top-level field there) while keeping it top-level for chat. Add regression tests for auth-header presence and absence, response_format routing, and the API key staying out of argv.

Wrap the resource-creating section of LLM.__init__ and AsyncLLM.__init__ so a failure after the http client is built (or the managed subprocess is spawned), for example an empty /v1/models discovery response, deterministically tears those resources down instead of leaking them until garbage collection. The sync client reuses its idempotent close(); the async client does synchronous cleanup because it cannot await in __init__. Add a best-effort __del__ to AsyncLLM mirroring the sync client so a never-awaited close() still stops the managed server and drops the async http client pool. await close() and async-with remain the correct API.

…y failure AsyncLLM.__init__ called is_managed() before self._closed = False, so __del__ raised AttributeError when the constructor failed at argument validation (ambiguous-args path). Move _closed initialization first. Add tests for async chat_stream, models(), openai_client escape hatch, model property before resolution, and mode-validation errors on AsyncLLM.

inureyes · 2026-06-23T21:48:15Z

PR Finalization Complete

Summary

Tests: Added 7 tests covering previously untested async paths:

test_async_chat_stream (async chat_stream was the only unexercised generation method)
test_async_models (async models() list)
test_async_openai_client_escape_hatch (async openai_client property type check)
test_async_model_property_raises_before_resolution (AsyncLLM.model before any request)
test_async_ambiguous_args_is_error and test_async_no_args_is_error (mode validation on AsyncLLM)

Bug fix: AsyncLLM.__init__ called is_managed() before self._closed = False, so __del__ raised AttributeError when the constructor failed at argument validation. Moved _closed initialization to the top of __init__. The new mode-validation tests caught this and now pass cleanly.

Docs (GitHub-facing): Added a "Security: multi-user hosts" section to docs/python-client.md between the Connect Mode and Streaming sections, recommending an explicit socket= path under $XDG_RUNTIME_DIR on shared machines.

Docs (MkDocs): Created docs/en/user-guide/python-client.md and docs/ko/user-guide/python-client.md in the internal docs tree (not tracked in the public repo, matching the established pattern). Added the Korean nav entry Python 클라이언트: user-guide/python-client.md to mkdocs.ko.yml.

Lint/Format: All checks pass. No Rust files touched.

Final gate results

ruff check python    -> All checks passed
ruff format --check  -> 14 files already formatted
mypy python/src      -> Success: no issues found in 7 source files
pytest (not e2e)     -> 43 passed, 2 deselected

Ready for merge.

inureyes added status:review Under review type:enhancement New features, capabilities, or significant additions priority:high High priority labels Jun 23, 2026

inureyes added 3 commits June 24, 2026 06:33

inureyes added status:done Completed and removed status:review Under review labels Jun 23, 2026

inureyes merged commit 8e5d426 into main Jun 23, 2026
6 checks passed

inureyes deleted the feature/issue-407-python-client branch June 23, 2026 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Phase 1 Python client package over the server#411

feat: add Phase 1 Python client package over the server#411
inureyes merged 4 commits into
mainfrom
feature/issue-407-python-client

inureyes commented Jun 23, 2026

Uh oh!

inureyes commented Jun 23, 2026

Uh oh!

inureyes commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

inureyes commented Jun 23, 2026

Summary

What changed

Test plan

Uh oh!

inureyes commented Jun 23, 2026

Implementation Review Summary

Intent

Findings Addressed (auto-fixed on this branch, not yet committed)

Remaining Items (report only, no code change)

Verification

Uh oh!

inureyes commented Jun 23, 2026

PR Finalization Complete

Summary

Final gate results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant