Skip to content

Security: aagumin/iskander

Security

docs/security.md

Security

The MVP applies request limits before backend execution:

  • max_batch_rows
  • max_batch_bytes
  • max_columns
  • max_string_bytes
  • max_nested_depth
  • request_timeout_ms

These limits reduce accidental OOM risk and reject malformed or unexpectedly large Arrow payloads before they reach model runtimes. Estimated bytes are based on Arrow array memory size and should be treated as a guardrail, not an exact allocator accounting.

request_timeout_ms is a scheduler wait timeout. It bounds how long the async server path waits for a backend result. For ONNX, Iskander also passes the timeout into ORT RunOptions and triggers RunOptions::terminate() when the timer expires. This is cooperative cancellation inside ONNX Runtime, not a hard kill. External worker backends should use process-level cancellation or worker recycling for stronger isolation.

Malformed Arrow payload handling currently relies on Arrow Flight and Arrow IPC decoding errors. Future hardening should add fuzz tests and stricter error reporting.

Planned controls:

  • TLS and mTLS.
  • Authentication and authorization.
  • Model-level permissions.
  • Model manifest signing or checksum verification.
  • Per-tenant quotas.
  • Per-model concurrency limits.
  • Backpressure and queue limits.
  • Metrics for rejected requests and timeout rates.

Runtime Boundaries

ONNX Runtime runs in-process. It should be treated as trusted native code loaded by the server operator. The Arrow schema manifest constrains the Arrow-facing contract but does not sandbox the model runtime.

SafeTensors support is metadata/artifact loading, not execution. Torch and arbitrary Python/pickle models should run in external worker processes so Python runtime failures, C++ ABI issues, and GPU library conflicts do not compromise the Rust server process.

There aren't any published security advisories