Iskander

Experimental MVP: a high-performance Arrow Flight inference ML server for batch, streaming, and data-native ML workloads.

Status

This project is an experimental foundation, not a production-ready inference platform. The current MVP focuses on Arrow RecordBatch in, schema validation, resource limits, ONNX Runtime inference, and Arrow RecordBatch out over Apache Arrow Flight.

What It Is

Iskander is a Rust inference server built around Apache Arrow Flight. Clients send typed Arrow batches, the server validates a model schema contract, applies security and resource limits, dispatches to a pluggable backend, and returns typed Arrow batches.

For ONNX models, the server can load an Arrow schema manifest next to the model artifact. That manifest defines the Arrow-facing input/output contract, lets the backend compile tensor mappings at model load time, and gives clients a stable schema contract instead of relying on runtime shape guessing.

The target workloads are embeddings, recommendations, tabular scoring, micro-batch streaming, lakehouse inference, ETL feature generation, and scientific or industrial batch analytics.

What It Is Not

It is not a generic Triton replacement, REST-first prediction API, LLM token streaming server, hard real-time robotics control loop, or universal inference server for every model family.

Why Arrow Flight

Arrow Flight carries columnar data over gRPC without converting every request into JSON or row-oriented payloads. It fits systems that already produce Arrow, Parquet, DataFusion, Polars, Spark, Iceberg, Delta, feature-store, or vector indexing pipelines.

Why Rust

Rust gives this project memory safety, typed errors, async networking, strong Arrow ecosystem support, and clean integration points for model backends. ONNX Runtime support is implemented through the ort crate, while safetensors is used for safe tensor artifact metadata and loading rather than compute.

Zero-Copy Aware Design

The design keeps Arrow buffers as the transport and validation representation. It does not claim full zero-copy inference. The ONNX backend has an input fast path that borrows non-null Arrow float32 / int64 primitive and fixed-size-list buffers as ORT tensor views, and it uses ONNX Runtime I/O binding with preallocated float32 outputs for known output shapes. Copies may still happen during Flight decode, nullable input handling, CPU-to-GPU transfer, and final Arrow output materialization. Copy boundaries are documented in docs/memory_model.md.

The current runtime is optimized for data-native batch and micro-batch inference. It is strongest when clients already hold Arrow-compatible columnar data and want Arrow-compatible results back. It is not optimized for single-row REST latency.

Quickstart

Recommended quickstart (MovieLens two-tower model from devmodels):

uv run devmodels/movielens_two_tower/train_two_tower.py
cargo run --release -p iskander-server --features onnx -- \
  --config examples/movielens-two-tower/config.toml

Default server address:

127.0.0.1:50051

The config points the Rust server directly at the exported ONNX artifact.

[[models]]
name = "movielens-two-tower"
backend = "onnx"
path = "devmodels/movielens_two_tower/artifacts/two_tower.onnx"
execution_providers = ["cpu"]
intra_threads = 4
inter_threads = 1
optimization_level = "level3"
parallel_execution = false
memory_pattern = true

Proof Of Concept Benchmark Snapshot

As a local CPU-only proof of concept, the repository now includes a matched-baseline open-loop benchmark harness in crates/iskander-bench. One representative movielens-two-tower run used:

batch size: 128
offered load: 3000 req/s
duration: 90s
zero drops and zero errors for all three SUTs

SUT	Transport	p50	p95	p99	p99.9	avg
Iskander	Arrow Flight	709 us	1181 us	3427 us	8746 us	815 us
Iskander	OIP v2 gRPC	941 us	1533 us	5285 us	14011 us	1144 us
Triton + ORT	OIP v2 gRPC	1125 us	3550 us	7698 us	16555 us	1536 us

In this snapshot, all three systems sustained the same fixed offered load, and Iskander over Arrow Flight showed the lowest client-observed latency. Iskander over OIP v2 also stayed below Triton on the same request shape and ORT-aligned CPU baseline.

These numbers are a proof-of-concept snapshot, not a universal claim. They come from one local matched-baseline setup and should be read together with the methodology in BENCHMARK.md and the raw artifacts in results/benchmarks/2026-05-08-matched-baseline/.

Use Cases

A. Batch Embeddings

Generate embeddings for documents, products, users, or images; run batch re-indexing; feed vector databases, feature stores, or lakehouse tables.

Input: id: utf8, text: utf8

Output: id: utf8, embedding: fixed_size_list<float32>[768]

B. Tabular Batch Scoring

Use for churn prediction, fraud scoring, credit risk, lead scoring, price prediction, and demand forecasting.

Input: entity_id: utf8, feature_1: float32, feature_2: float32, feature_n: float32

Output: entity_id: utf8, score: float32, label: utf8

C. Recommendation Reranking

Score user-item candidates for feeds, ads, marketplaces, and matching systems.

Input: user_id: utf8, item_id: utf8, user_features: struct, item_features: struct

Output: user_id: utf8, item_id: utf8, relevance_score: float32, rank: int32

D. Streaming Micro-Batch Inference

Apply inference to fraud detection, anomaly detection, IoT telemetry, clickstream scoring, and monitoring streams.

This server is designed for micro-batches, not single-event REST latency.

E. Lakehouse / ETL Inference

Run offline inference over Parquet, Iceberg, Delta, and Arrow pipelines. Add prediction columns, generate features, and write enriched datasets.

F. Robotics / Sensor Analytics

Handle lidar point cloud batches, sensor windows, telemetry anomaly detection, fleet analytics, and perception pipeline support.

Not intended for hard real-time control loops.

G. Scientific / Industrial Data

Support genomics, manufacturing sensors, energy grids, finance time series, climate data, and simulation outputs.

Backend Support Matrix

Backend	Status	Notes
`onnx`	Implemented / feature-gated	Uses `ort 2.0.0-rc.12`; supports Float32/Int64 inputs with borrowed Arrow-buffer fast paths for compatible non-null layouts.
`torch-worker`	Planned	External Python/C++ worker over IPC/gRPC/Arrow IPC planned.
`python-worker`	Planned	Mosec-style worker process for pickle/sklearn/Python models.

License

TBD.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmarks		benchmarks
crates		crates
devmodels		devmodels
docs		docs
examples		examples
results		results
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
BENCHMARK.md		BENCHMARK.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
README.md		README.md
config.example.toml		config.example.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iskander

Status

What It Is

What It Is Not

Why Arrow Flight

Why Rust

Zero-Copy Aware Design

Quickstart

Proof Of Concept Benchmark Snapshot

Use Cases

A. Batch Embeddings

B. Tabular Batch Scoring

C. Recommendation Reranking

D. Streaming Micro-Batch Inference

E. Lakehouse / ETL Inference

F. Robotics / Sensor Analytics

G. Scientific / Industrial Data

Backend Support Matrix

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Iskander

Status

What It Is

What It Is Not

Why Arrow Flight

Why Rust

Zero-Copy Aware Design

Quickstart

Proof Of Concept Benchmark Snapshot

Use Cases

A. Batch Embeddings

B. Tabular Batch Scoring

C. Recommendation Reranking

D. Streaming Micro-Batch Inference

E. Lakehouse / ETL Inference

F. Robotics / Sensor Analytics

G. Scientific / Industrial Data

Backend Support Matrix

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages