fix(image-gen): fix false loaded-model reporting + rknn-sd idle unload#294
Conversation
…nload
Part 1 (routes/models.py):
- rknn-sd: query /health instead of /v1/models; only report the model as
loaded if pipeline_loaded is true. /v1/models lists what the server can
serve (always non-empty), not what is in NPU memory.
- sd-cpp: drop the branch entirely — /sdapi/v1/options does not expose
load state, so reporting the configured checkpoint as loaded is always
wrong. Left a clarifying comment; sd-cpp models surface via the Models
app instead.
Part 2 (services/rknn_sd_server.py):
- Track _last_activity_ts (time.monotonic) — set on pipeline load and
updated at the end of every /generate so an in-flight request cannot
race the idle check.
- _unload_pipeline(): clears _pipe, calls release()/close() best-effort,
gc.collect(). Survives wrappers that throw on release().
- _idle_unload_loop(): background task started from startup hook. Sleeps
IDLE_UNLOAD_INTERVAL_S (60 s default), unloads when idle >=
IDLE_UNLOAD_THRESHOLD_S (600 s default). Set RKNN_SD_IDLE_UNLOAD_S=0
to disable entirely.
- POST /admin/unload: manual eviction endpoint returning {ok, was_loaded}.
Auth is a follow-up; note in docstring.
- GET /health now also returns idle_seconds and idle_unload_threshold_s.
Tests:
- test_routes_models.py: updated rknn-sd tests to use /health mock with
pipeline_loaded true/false; added sd-cpp never-appears-in-loaded test.
- tests/test_rknn_sd_server.py: 16 new tests covering _ensure_pipeline_sync
timestamp, _unload_pipeline (happy path + release-throws), idle loop
logic (monkeypatched time.monotonic + asyncio.sleep), health endpoint
fields, and /admin/unload both states.
📝 WalkthroughWalkthroughThis change set introduces an automatic pipeline idle-unload mechanism for the RKNN SD server, refactors model loading status detection to use backend health endpoints instead of model listings, and adds comprehensive test coverage for the new lifecycle management features. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 8/10 reviews remaining, refill in 8 minutes and 15 seconds. Comment |
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (4 files)
Reviewed by grok-code-fast-1:optimized:free · 156,815 tokens |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
tests/test_rknn_sd_server.py (1)
21-26: ⚡ Quick winReset all mutable module globals in the shared test helper.
For stronger isolation, also reset
_runtime_name(and recreate_pipe_lock) in_reset_server_state().Suggested helper update
def _reset_server_state(pipe_value=None, last_activity=0.0, load_error=None): """Reset module-level globals to a known state for each test.""" srv._pipe = pipe_value srv._last_activity_ts = last_activity srv._load_error = load_error + srv._runtime_name = "ez_rknn_async" if not srv.USE_LEGACY_WRAPPER else "rknn-toolkit-lite2" + srv._pipe_lock = asyncio.Lock()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_rknn_sd_server.py` around lines 21 - 26, The test helper _reset_server_state should also clear the runtime identifier and replace the lock to avoid cross-test leakage: inside _reset_server_state (the function in tests/test_rknn_sd_server.py) set srv._runtime_name = None and recreate srv._pipe_lock with a fresh lock instance (e.g., threading.Lock()) so each test gets a new _pipe_lock and no stale _runtime_name persists.tests/test_routes_models.py (1)
316-338: ⚡ Quick winStrengthen the
sd-cppskip test by asserting no backend request is made.Right now the test only checks output. Add a call assertion to lock in the “skip entirely” contract.
Suggested test tightening
async with await _make_auth_client(app) as c: with patch.object(app.state, "http_client") as mock_http: mock_http.get = AsyncMock(return_value=mock_response) resp = await c.get("/api/models/loaded") + mock_http.get.assert_not_called()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_routes_models.py` around lines 316 - 338, In test_sd_cpp_never_appears_in_loaded add an assertion that the backend HTTP client was never called: while you patch app.state.http_client and set mock_http.get = AsyncMock(...), after performing resp = await c.get("/api/models/loaded") assert mock_http.get.assert_not_called() (or assert mock_http.get.call_count == 0) to ensure the sd-cpp backend was skipped entirely; keep the assertion before closing/awaiting app.state resources.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tinyagentos/services/rknn_sd_server.py`:
- Around line 175-182: The teardown currently calls both pipe.release() and
pipe.close() if both exist which can double-release resources; change the logic
in the loop that iterates over ("release", "close") to first try to call
pipe.release() if available and ONLY call pipe.close() when pipe.release is not
present; preserve the existing try/except behavior and the logger.info message
(the f"_unload_pipeline: {method_name}() raised (ignored): {exc}" line) and keep
gc.collect() after the teardown so resources are collected.
- Around line 259-271: The POST /admin/unload endpoint (function admin_unload)
currently allows unauthenticated calls that can evict the pipeline via
_unload_pipeline; add an authentication/authorization check at the top of
admin_unload (or as a FastAPI dependency on app.post) that validates a
secret/credential (e.g., an X-Admin-Token header or a
Depends(authenticate_admin) function) using a timing-safe comparison and returns
401/403 when invalid, then only call _unload_pipeline() if the check passes;
ensure the auth helper is reusable for other admin endpoints and log
unauthorized attempts without exposing secrets.
- Around line 216-222: The startup currently spawns the background loop with
asyncio.create_task(...) without retaining the Task; change _startup to assign
that task to a module-level name (e.g., idle_unload_task) so we keep a handle,
add an app.on_event("shutdown") handler (e.g., _shutdown) that cancels
idle_unload_task, awaits it, and handles asyncio.CancelledError cleanly, and
ensure _idle_unload_loop can exit on cancellation; reference the existing
symbols _startup, _idle_unload_loop and use a module-level idle_unload_task and
a new _shutdown to cancel and await the task on shutdown.
- Around line 53-56: The environment-derived check interval currently assigned
to IDLE_UNLOAD_INTERVAL_S (from RKNN_SD_IDLE_CHECK_S) can be zero or negative
and cause a busy loop; validate and sanitize this value when computing
IDLE_UNLOAD_INTERVAL_S by parsing the env var, ensuring it is a positive
non-zero float (or clamp to a safe minimum like 1.0) and falling back to the
existing default ("60") if invalid; update the assignment logic around
IDLE_UNLOAD_INTERVAL_S (and optionally validate
_idle_unload_s_raw/IDLE_UNLOAD_THRESHOLD_S) so negative/zero values cannot be
used to drive the idle loop.
---
Nitpick comments:
In `@tests/test_rknn_sd_server.py`:
- Around line 21-26: The test helper _reset_server_state should also clear the
runtime identifier and replace the lock to avoid cross-test leakage: inside
_reset_server_state (the function in tests/test_rknn_sd_server.py) set
srv._runtime_name = None and recreate srv._pipe_lock with a fresh lock instance
(e.g., threading.Lock()) so each test gets a new _pipe_lock and no stale
_runtime_name persists.
In `@tests/test_routes_models.py`:
- Around line 316-338: In test_sd_cpp_never_appears_in_loaded add an assertion
that the backend HTTP client was never called: while you patch
app.state.http_client and set mock_http.get = AsyncMock(...), after performing
resp = await c.get("/api/models/loaded") assert
mock_http.get.assert_not_called() (or assert mock_http.get.call_count == 0) to
ensure the sd-cpp backend was skipped entirely; keep the assertion before
closing/awaiting app.state resources.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: ff66ad86-4440-4eae-8321-6c1adc89cfbd
📒 Files selected for processing (4)
tests/test_rknn_sd_server.pytests/test_routes_models.pytinyagentos/routes/models.pytinyagentos/services/rknn_sd_server.py
| # Idle-unload settings. Set RKNN_SD_IDLE_UNLOAD_S=0 to disable entirely. | ||
| _idle_unload_s_raw = os.environ.get("RKNN_SD_IDLE_UNLOAD_S", "600") | ||
| IDLE_UNLOAD_THRESHOLD_S: float | None = None if _idle_unload_s_raw.strip() == "0" else float(_idle_unload_s_raw) | ||
| IDLE_UNLOAD_INTERVAL_S: float = float(os.environ.get("RKNN_SD_IDLE_CHECK_S", "60")) |
There was a problem hiding this comment.
Validate idle loop intervals to prevent busy-loop misconfiguration.
RKNN_SD_IDLE_CHECK_S accepts 0/negative values, which can turn Line 208 into a hot loop and spike CPU.
Suggested fix
_idle_unload_s_raw = os.environ.get("RKNN_SD_IDLE_UNLOAD_S", "600")
IDLE_UNLOAD_THRESHOLD_S: float | None = None if _idle_unload_s_raw.strip() == "0" else float(_idle_unload_s_raw)
IDLE_UNLOAD_INTERVAL_S: float = float(os.environ.get("RKNN_SD_IDLE_CHECK_S", "60"))
+if IDLE_UNLOAD_THRESHOLD_S is not None and IDLE_UNLOAD_THRESHOLD_S < 0:
+ raise ValueError("RKNN_SD_IDLE_UNLOAD_S must be >= 0")
+if IDLE_UNLOAD_INTERVAL_S <= 0:
+ raise ValueError("RKNN_SD_IDLE_CHECK_S must be > 0")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tinyagentos/services/rknn_sd_server.py` around lines 53 - 56, The
environment-derived check interval currently assigned to IDLE_UNLOAD_INTERVAL_S
(from RKNN_SD_IDLE_CHECK_S) can be zero or negative and cause a busy loop;
validate and sanitize this value when computing IDLE_UNLOAD_INTERVAL_S by
parsing the env var, ensuring it is a positive non-zero float (or clamp to a
safe minimum like 1.0) and falling back to the existing default ("60") if
invalid; update the assignment logic around IDLE_UNLOAD_INTERVAL_S (and
optionally validate _idle_unload_s_raw/IDLE_UNLOAD_THRESHOLD_S) so negative/zero
values cannot be used to drive the idle loop.
| for method_name in ("release", "close"): | ||
| method = getattr(pipe, method_name, None) | ||
| if callable(method): | ||
| try: | ||
| method() | ||
| except Exception as exc: | ||
| logger.info(f"_unload_pipeline: {method_name}() raised (ignored): {exc}") | ||
| gc.collect() |
There was a problem hiding this comment.
Use close() only as a fallback when release() is unavailable.
Current teardown calls both methods if both exist. That can double-release resources.
Suggested fix
- for method_name in ("release", "close"):
- method = getattr(pipe, method_name, None)
- if callable(method):
- try:
- method()
- except Exception as exc:
- logger.info(f"_unload_pipeline: {method_name}() raised (ignored): {exc}")
+ release = getattr(pipe, "release", None)
+ close = getattr(pipe, "close", None)
+ method_name, method = ("release", release) if callable(release) else ("close", close)
+ if callable(method):
+ try:
+ method()
+ except Exception as exc:
+ logger.info(f"_unload_pipeline: {method_name}() raised (ignored): {exc}")🧰 Tools
🪛 Ruff (0.15.12)
[warning] 180-180: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tinyagentos/services/rknn_sd_server.py` around lines 175 - 182, The teardown
currently calls both pipe.release() and pipe.close() if both exist which can
double-release resources; change the logic in the loop that iterates over
("release", "close") to first try to call pipe.release() if available and ONLY
call pipe.close() when pipe.release is not present; preserve the existing
try/except behavior and the logger.info message (the f"_unload_pipeline:
{method_name}() raised (ignored): {exc}" line) and keep gc.collect() after the
teardown so resources are collected.
| @app.on_event("startup") | ||
| async def _startup(): | ||
| # Intentionally empty: the pipeline is now lazy-loaded on the first | ||
| # /generate request. Old behaviour was to build the full pipeline | ||
| # here, which pinned ~5.5 GB of RAM at boot for users who never | ||
| # actually used image gen. | ||
| # Pipeline is lazy-loaded on the first /generate request. | ||
| # Old behaviour was to build here, pinning ~5.5 GB of RAM at boot. | ||
| logger.info("rknn_sd_server up — pipeline will lazy-load on first /generate request") | ||
| asyncio.create_task(_idle_unload_loop()) | ||
|
|
There was a problem hiding this comment.
Keep a handle to the idle-unload task and cancel it on shutdown.
Line 221 starts a detached task without lifecycle management. Store it and stop it cleanly to avoid orphaned background work.
Suggested fix
app = FastAPI(title="RKNN Stable Diffusion", version="0.1.0")
+_idle_task: asyncio.Task | None = None
@@
`@app.on_event`("startup")
async def _startup():
@@
- asyncio.create_task(_idle_unload_loop())
+ global _idle_task
+ _idle_task = asyncio.create_task(_idle_unload_loop())
+
+@app.on_event("shutdown")
+async def _shutdown():
+ global _idle_task
+ if _idle_task is not None:
+ _idle_task.cancel()
+ try:
+ await _idle_task
+ except asyncio.CancelledError:
+ pass
+ _idle_task = None🧰 Tools
🪛 Ruff (0.15.12)
[warning] 221-221: Store a reference to the return value of asyncio.create_task
(RUF006)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tinyagentos/services/rknn_sd_server.py` around lines 216 - 222, The startup
currently spawns the background loop with asyncio.create_task(...) without
retaining the Task; change _startup to assign that task to a module-level name
(e.g., idle_unload_task) so we keep a handle, add an app.on_event("shutdown")
handler (e.g., _shutdown) that cancels idle_unload_task, awaits it, and handles
asyncio.CancelledError cleanly, and ensure _idle_unload_loop can exit on
cancellation; reference the existing symbols _startup, _idle_unload_loop and use
a module-level idle_unload_task and a new _shutdown to cancel and await the task
on shutdown.
| @app.post("/admin/unload") | ||
| async def admin_unload(): | ||
| """Manually evict the pipeline from NPU memory. | ||
|
|
||
| Returns {ok: true, was_loaded: bool}. | ||
|
|
||
| NOTE: the rknn-sd server binds to RKNN_SD_HOST (default 0.0.0.0). | ||
| Auth scoping is a follow-up — do not expose this port externally | ||
| without a firewall rule or auth middleware. | ||
| """ | ||
| was_loaded = _pipe is not None | ||
| _unload_pipeline() | ||
| return {"ok": True, "was_loaded": was_loaded} |
There was a problem hiding this comment.
Protect POST /admin/unload with authentication/authorization.
This endpoint is currently open and can be abused to repeatedly evict the pipeline.
Minimal hardening option
-from fastapi import FastAPI, HTTPException
+from fastapi import FastAPI, HTTPException, Header
@@
+ADMIN_TOKEN = os.environ.get("RKNN_SD_ADMIN_TOKEN", "").strip()
@@
`@app.post`("/admin/unload")
-async def admin_unload():
+async def admin_unload(x_admin_token: str | None = Header(default=None)):
@@
+ if not ADMIN_TOKEN or x_admin_token != ADMIN_TOKEN:
+ raise HTTPException(status_code=403, detail="forbidden")
was_loaded = _pipe is not None
_unload_pipeline()
return {"ok": True, "was_loaded": was_loaded}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tinyagentos/services/rknn_sd_server.py` around lines 259 - 271, The POST
/admin/unload endpoint (function admin_unload) currently allows unauthenticated
calls that can evict the pipeline via _unload_pipeline; add an
authentication/authorization check at the top of admin_unload (or as a FastAPI
dependency on app.post) that validates a secret/credential (e.g., an
X-Admin-Token header or a Depends(authenticate_admin) function) using a
timing-safe comparison and returns 401/403 when invalid, then only call
_unload_pipeline() if the check passes; ensure the auth helper is reusable for
other admin endpoints and log unauthorized attempts without exposing secrets.
Summary
Part 1 — reporting fix (
routes/models.py): The Loaded Models widget was showinglcm-dreamshaper-v7-rknnas always-loaded on the Pi becauserknn-sdwas queried at/v1/models(which lists what the server can serve) rather than/health(which reports actual NPU memory state). Fixed: now checks/healthand only adds the entry ifpipeline_loadedistrue.sd-cppis dropped from the loaded list entirely — its/sdapi/v1/optionsdoes not expose load state, so the checkpoint name was always misleading; sd-cpp models appear in the Models app instead.Part 2 — idle unload (
services/rknn_sd_server.py): Phase 1.5 ("sequential eviction") is now built. The pipeline is unloaded after 10 minutes of inactivity (configurable viaRKNN_SD_IDLE_UNLOAD_S; set to0to disable). A background loop wakes every 60 s (RKNN_SD_IDLE_CHECK_S) and calls_unload_pipeline()if idle threshold is exceeded.POST /admin/unloadprovides a manual eviction escape hatch./healthnow also reportsidle_secondsandidle_unload_threshold_sfor future dashboard use.Design decisions
time.monotonic()— not wall clock, so NTP jumps and suspend/resume don't cause spurious unloads._ensure_pipeline_sync) AND at the end of every/generateresponse. Setting it at the end (not the start) means an in-flight generate cannot race the idle checker — the pipeline stays live through the full request._unload_pipeline()safety: Triesrelease()thenclose()in a best-effort loop; logs at INFO and continues if either throws. Always sets_pipe = Noneand callsgc.collect().monkeypatchontime.monotonic(returns a constant far past the threshold) andasyncio.sleep(counts calls and raisesCancelledErrorafter N iterations to break the loop). No real sleeps needed.Test plan
pytest tests/test_routes_models.py— 19 tests, all pass (updated rknn-sd tests now mock/health; sd-cpp test now asserts empty loaded list)pytest tests/test_rknn_sd_server.py— 16 new tests covering: activity timestamp on load, idempotent_ensure_pipeline_sync,_unload_pipelinehappy path + release-throws, idle loop unloads past threshold / holds below threshold / noop when disabled,/healthfields,/admin/unloadboth statespytest tests/— 2527 passed, 5 pre-existing failures (hardware detection on macOS, MCP supervisor) unrelated to this PRFollow-ups (out of scope here)
POST /admin/unloadhas no auth; rknn-sd binds to0.0.0.0by default. Auth middleware or firewall rule should gate this before exposing externally./healthnow providesidle_secondsandidle_unload_threshold_sfor a "Will unload in X min" indicator.Summary by CodeRabbit
New Features
Bug Fixes