-
-
Notifications
You must be signed in to change notification settings - Fork 12
Migrate rknn_sd_server to ez_rknn_async backend #60
Copy link
Copy link
Open
Labels
backend-drivenBackend-driven discoveryBackend-driven discoveryclusterDistributed cluster and workersDistributed cluster and workersenhancementNew feature or requestNew feature or requestfeatureNew featureNew featureinfrastructureBuild system, CI, deploymentBuild system, CI, deploymentkilo-auto-fixAuto-generated label by KiloAuto-generated label by Kilokilo-triagedAuto-generated label by KiloAuto-generated label by KilomodelsModel management and inferenceModel management and inferencenpu-rk3588RK3588 NPU workRK3588 NPU worktestingTesting and QATesting and QA
Metadata
Metadata
Assignees
Labels
backend-drivenBackend-driven discoveryBackend-driven discoveryclusterDistributed cluster and workersDistributed cluster and workersenhancementNew feature or requestNew feature or requestfeatureNew featureNew featureinfrastructureBuild system, CI, deploymentBuild system, CI, deploymentkilo-auto-fixAuto-generated label by KiloAuto-generated label by Kilokilo-triagedAuto-generated label by KiloAuto-generated label by KilomodelsModel management and inferenceModel management and inferencenpu-rk3588RK3588 NPU workRK3588 NPU worktestingTesting and QATesting and QA
rknn_sd_servercurrently usesrknn-toolkit-lite2. The spike onfeat/ez-rknn-asyncconfirmedez_rknn_asyncgives a 1.26x single-model speedup (all 3 NPU cores tensor-parallel) and 1.78x for two concurrent models on separate cores. The migration switchesrknn_sd_serverto this backend as the default path, with automatic fallback torknn-toolkit-lite2if the driver is incompatible.Spike results:
scripts/spikes/ez-rknn-async/. In-flight on branchfeat/ez-rknn-async.Acceptance criteria:
rknn_sd_serverusesez_rknn_asyncInferenceSession by defaultrknn-toolkit-lite2on driver mismatch or init failure — user loses the speed wins but keeps a working server/healthendpoint reportsruntime: "ez_rknn_async"or"rknn-toolkit-lite2"so the Cluster widget can surface which path is activetp_modeis configurable per session (default:allwhen solo, split when co-resident — to be governed by the core-aware scheduler once that lands)