Skip to content

Migrate rknn_sd_server to ez_rknn_async backend #60

@jaylfc

Description

@jaylfc

rknn_sd_server currently uses rknn-toolkit-lite2. The spike on feat/ez-rknn-async confirmed ez_rknn_async gives a 1.26x single-model speedup (all 3 NPU cores tensor-parallel) and 1.78x for two concurrent models on separate cores. The migration switches rknn_sd_server to this backend as the default path, with automatic fallback to rknn-toolkit-lite2 if the driver is incompatible.

Spike results: scripts/spikes/ez-rknn-async/. In-flight on branch feat/ez-rknn-async.

Acceptance criteria:

  • rknn_sd_server uses ez_rknn_async InferenceSession by default
  • Automatic fallback to rknn-toolkit-lite2 on driver mismatch or init failure — user loses the speed wins but keeps a working server
  • /health endpoint reports runtime: "ez_rknn_async" or "rknn-toolkit-lite2" so the Cluster widget can surface which path is active
  • Existing Layer 3 smoke tests pass on both paths
  • tp_mode is configurable per session (default: all when solo, split when co-resident — to be governed by the core-aware scheduler once that lands)

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend-drivenBackend-driven discoveryclusterDistributed cluster and workersenhancementNew feature or requestfeatureNew featureinfrastructureBuild system, CI, deploymentkilo-auto-fixAuto-generated label by Kilokilo-triagedAuto-generated label by KilomodelsModel management and inferencenpu-rk3588RK3588 NPU worktestingTesting and QA

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions