Skip to content

fix(image-gen): fix false loaded-model reporting + rknn-sd idle unload#294

Merged
jaylfc merged 1 commit into
masterfrom
feat/image-gen-lifecycle-fixes
May 1, 2026
Merged

fix(image-gen): fix false loaded-model reporting + rknn-sd idle unload#294
jaylfc merged 1 commit into
masterfrom
feat/image-gen-lifecycle-fixes

Conversation

@jaylfc
Copy link
Copy Markdown
Owner

@jaylfc jaylfc commented May 1, 2026

Summary

  • Part 1 — reporting fix (routes/models.py): The Loaded Models widget was showing lcm-dreamshaper-v7-rknn as always-loaded on the Pi because rknn-sd was queried at /v1/models (which lists what the server can serve) rather than /health (which reports actual NPU memory state). Fixed: now checks /health and only adds the entry if pipeline_loaded is true. sd-cpp is dropped from the loaded list entirely — its /sdapi/v1/options does not expose load state, so the checkpoint name was always misleading; sd-cpp models appear in the Models app instead.

  • Part 2 — idle unload (services/rknn_sd_server.py): Phase 1.5 ("sequential eviction") is now built. The pipeline is unloaded after 10 minutes of inactivity (configurable via RKNN_SD_IDLE_UNLOAD_S; set to 0 to disable). A background loop wakes every 60 s (RKNN_SD_IDLE_CHECK_S) and calls _unload_pipeline() if idle threshold is exceeded. POST /admin/unload provides a manual eviction escape hatch. /health now also reports idle_seconds and idle_unload_threshold_s for future dashboard use.

Design decisions

  • Time source for idle: time.monotonic() — not wall clock, so NTP jumps and suspend/resume don't cause spurious unloads.
  • Activity timestamp placement: Set on pipeline load (in _ensure_pipeline_sync) AND at the end of every /generate response. Setting it at the end (not the start) means an in-flight generate cannot race the idle checker — the pipeline stays live through the full request.
  • _unload_pipeline() safety: Tries release() then close() in a best-effort loop; logs at INFO and continues if either throws. Always sets _pipe = None and calls gc.collect().
  • sd-cpp dropped, not silenced: The simplest correct option. A comment in the code explains the limitation.
  • Test time mocking: monkeypatch on time.monotonic (returns a constant far past the threshold) and asyncio.sleep (counts calls and raises CancelledError after N iterations to break the loop). No real sleeps needed.

Test plan

  • pytest tests/test_routes_models.py — 19 tests, all pass (updated rknn-sd tests now mock /health; sd-cpp test now asserts empty loaded list)
  • pytest tests/test_rknn_sd_server.py — 16 new tests covering: activity timestamp on load, idempotent _ensure_pipeline_sync, _unload_pipeline happy path + release-throws, idle loop unloads past threshold / holds below threshold / noop when disabled, /health fields, /admin/unload both states
  • Full suite pytest tests/ — 2527 passed, 5 pre-existing failures (hardware detection on macOS, MCP supervisor) unrelated to this PR

Follow-ups (out of scope here)

  • POST /admin/unload has no auth; rknn-sd binds to 0.0.0.0 by default. Auth middleware or firewall rule should gate this before exposing externally.
  • sd-cpp idle unload — different server lifecycle, separate concern.
  • Frontend idle countdown widget — /health now provides idle_seconds and idle_unload_threshold_s for a "Will unload in X min" indicator.

Summary by CodeRabbit

  • New Features

    • Automatic pipeline unload after configurable idle period to optimize resource usage
    • Manual pipeline unload via new admin endpoint
    • Enhanced health status now reports idle time and configured unload threshold
  • Bug Fixes

    • Corrected model loading status reporting for certain backends

…nload

Part 1 (routes/models.py):
- rknn-sd: query /health instead of /v1/models; only report the model as
  loaded if pipeline_loaded is true. /v1/models lists what the server can
  serve (always non-empty), not what is in NPU memory.
- sd-cpp: drop the branch entirely — /sdapi/v1/options does not expose
  load state, so reporting the configured checkpoint as loaded is always
  wrong. Left a clarifying comment; sd-cpp models surface via the Models
  app instead.

Part 2 (services/rknn_sd_server.py):
- Track _last_activity_ts (time.monotonic) — set on pipeline load and
  updated at the end of every /generate so an in-flight request cannot
  race the idle check.
- _unload_pipeline(): clears _pipe, calls release()/close() best-effort,
  gc.collect(). Survives wrappers that throw on release().
- _idle_unload_loop(): background task started from startup hook. Sleeps
  IDLE_UNLOAD_INTERVAL_S (60 s default), unloads when idle >=
  IDLE_UNLOAD_THRESHOLD_S (600 s default). Set RKNN_SD_IDLE_UNLOAD_S=0
  to disable entirely.
- POST /admin/unload: manual eviction endpoint returning {ok, was_loaded}.
  Auth is a follow-up; note in docstring.
- GET /health now also returns idle_seconds and idle_unload_threshold_s.

Tests:
- test_routes_models.py: updated rknn-sd tests to use /health mock with
  pipeline_loaded true/false; added sd-cpp never-appears-in-loaded test.
- tests/test_rknn_sd_server.py: 16 new tests covering _ensure_pipeline_sync
  timestamp, _unload_pipeline (happy path + release-throws), idle loop
  logic (monkeypatched time.monotonic + asyncio.sleep), health endpoint
  fields, and /admin/unload both states.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

📝 Walkthrough

Walkthrough

This change set introduces an automatic pipeline idle-unload mechanism for the RKNN SD server, refactors model loading status detection to use backend health endpoints instead of model listings, and adds comprehensive test coverage for the new lifecycle management features.

Changes

Cohort / File(s) Summary
Pipeline lifecycle and idle unload
tinyagentos/services/rknn_sd_server.py
Adds configurable idle-unload threshold (RKNN_SD_IDLE_UNLOAD_S, RKNN_SD_IDLE_CHECK_S), tracks last activity timestamp, implements _unload_pipeline() with best-effort runtime cleanup, starts background async loop for periodic idle checks, extends /health to report idle seconds and threshold, adds POST /admin/unload endpoint for manual unload.
Model loading status detection
tinyagentos/routes/models.py
Updates rknn-sd loaded status to depend on /health endpoint's pipeline_loaded field instead of model listing presence; removes sd-cpp from contributing any entries to the loaded models list, regardless of checkpoint configuration.
RKNN server test suite
tests/test_rknn_sd_server.py
Adds 272 lines of comprehensive tests covering module state reset, internal function behavior (_ensure_pipeline_sync, _unload_pipeline), async idle unload loop with threshold validation, and ASGI-based HTTP integration tests for /health and /admin/unload endpoints.
Model routes test updates
tests/test_routes_models.py
Reworks assertions for loaded models: rknn-sd-npu now tests conditional inclusion based on mocked pipeline_loaded field; sd-cpp assertion changed to expect zero entries; rknn-sd connection error behavior moved and preserved.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A pipeline sleeps when work is done,
With idle checks beneath the sun—
Health tickers pulse, unloads take flight,
Fresh tests ensure it works just right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures both main changes: fixing false loaded-model reporting for rknn-sd and sd-cpp, and implementing the idle unload mechanism for rknn-sd.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/image-gen-lifecycle-fixes

Review rate limit: 8/10 reviews remaining, refill in 8 minutes and 15 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented May 1, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (4 files)
  • tests/test_rknn_sd_server.py - New test file for idle-unload functionality
  • tests/test_routes_models.py - Updated tests for changed loaded models logic
  • tinyagentos/routes/models.py - Modified to use /health for rknn-sd load state
  • tinyagentos/services/rknn_sd_server.py - Added idle-unload, new endpoints, and activity tracking

Reviewed by grok-code-fast-1:optimized:free · 156,815 tokens

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
tests/test_rknn_sd_server.py (1)

21-26: ⚡ Quick win

Reset all mutable module globals in the shared test helper.

For stronger isolation, also reset _runtime_name (and recreate _pipe_lock) in _reset_server_state().

Suggested helper update
 def _reset_server_state(pipe_value=None, last_activity=0.0, load_error=None):
     """Reset module-level globals to a known state for each test."""
     srv._pipe = pipe_value
     srv._last_activity_ts = last_activity
     srv._load_error = load_error
+    srv._runtime_name = "ez_rknn_async" if not srv.USE_LEGACY_WRAPPER else "rknn-toolkit-lite2"
+    srv._pipe_lock = asyncio.Lock()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_rknn_sd_server.py` around lines 21 - 26, The test helper
_reset_server_state should also clear the runtime identifier and replace the
lock to avoid cross-test leakage: inside _reset_server_state (the function in
tests/test_rknn_sd_server.py) set srv._runtime_name = None and recreate
srv._pipe_lock with a fresh lock instance (e.g., threading.Lock()) so each test
gets a new _pipe_lock and no stale _runtime_name persists.
tests/test_routes_models.py (1)

316-338: ⚡ Quick win

Strengthen the sd-cpp skip test by asserting no backend request is made.

Right now the test only checks output. Add a call assertion to lock in the “skip entirely” contract.

Suggested test tightening
         async with await _make_auth_client(app) as c:
             with patch.object(app.state, "http_client") as mock_http:
                 mock_http.get = AsyncMock(return_value=mock_response)
                 resp = await c.get("/api/models/loaded")
+                mock_http.get.assert_not_called()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_routes_models.py` around lines 316 - 338, In
test_sd_cpp_never_appears_in_loaded add an assertion that the backend HTTP
client was never called: while you patch app.state.http_client and set
mock_http.get = AsyncMock(...), after performing resp = await
c.get("/api/models/loaded") assert mock_http.get.assert_not_called() (or assert
mock_http.get.call_count == 0) to ensure the sd-cpp backend was skipped
entirely; keep the assertion before closing/awaiting app.state resources.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tinyagentos/services/rknn_sd_server.py`:
- Around line 175-182: The teardown currently calls both pipe.release() and
pipe.close() if both exist which can double-release resources; change the logic
in the loop that iterates over ("release", "close") to first try to call
pipe.release() if available and ONLY call pipe.close() when pipe.release is not
present; preserve the existing try/except behavior and the logger.info message
(the f"_unload_pipeline: {method_name}() raised (ignored): {exc}" line) and keep
gc.collect() after the teardown so resources are collected.
- Around line 259-271: The POST /admin/unload endpoint (function admin_unload)
currently allows unauthenticated calls that can evict the pipeline via
_unload_pipeline; add an authentication/authorization check at the top of
admin_unload (or as a FastAPI dependency on app.post) that validates a
secret/credential (e.g., an X-Admin-Token header or a
Depends(authenticate_admin) function) using a timing-safe comparison and returns
401/403 when invalid, then only call _unload_pipeline() if the check passes;
ensure the auth helper is reusable for other admin endpoints and log
unauthorized attempts without exposing secrets.
- Around line 216-222: The startup currently spawns the background loop with
asyncio.create_task(...) without retaining the Task; change _startup to assign
that task to a module-level name (e.g., idle_unload_task) so we keep a handle,
add an app.on_event("shutdown") handler (e.g., _shutdown) that cancels
idle_unload_task, awaits it, and handles asyncio.CancelledError cleanly, and
ensure _idle_unload_loop can exit on cancellation; reference the existing
symbols _startup, _idle_unload_loop and use a module-level idle_unload_task and
a new _shutdown to cancel and await the task on shutdown.
- Around line 53-56: The environment-derived check interval currently assigned
to IDLE_UNLOAD_INTERVAL_S (from RKNN_SD_IDLE_CHECK_S) can be zero or negative
and cause a busy loop; validate and sanitize this value when computing
IDLE_UNLOAD_INTERVAL_S by parsing the env var, ensuring it is a positive
non-zero float (or clamp to a safe minimum like 1.0) and falling back to the
existing default ("60") if invalid; update the assignment logic around
IDLE_UNLOAD_INTERVAL_S (and optionally validate
_idle_unload_s_raw/IDLE_UNLOAD_THRESHOLD_S) so negative/zero values cannot be
used to drive the idle loop.

---

Nitpick comments:
In `@tests/test_rknn_sd_server.py`:
- Around line 21-26: The test helper _reset_server_state should also clear the
runtime identifier and replace the lock to avoid cross-test leakage: inside
_reset_server_state (the function in tests/test_rknn_sd_server.py) set
srv._runtime_name = None and recreate srv._pipe_lock with a fresh lock instance
(e.g., threading.Lock()) so each test gets a new _pipe_lock and no stale
_runtime_name persists.

In `@tests/test_routes_models.py`:
- Around line 316-338: In test_sd_cpp_never_appears_in_loaded add an assertion
that the backend HTTP client was never called: while you patch
app.state.http_client and set mock_http.get = AsyncMock(...), after performing
resp = await c.get("/api/models/loaded") assert
mock_http.get.assert_not_called() (or assert mock_http.get.call_count == 0) to
ensure the sd-cpp backend was skipped entirely; keep the assertion before
closing/awaiting app.state resources.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ff66ad86-4440-4eae-8321-6c1adc89cfbd

📥 Commits

Reviewing files that changed from the base of the PR and between 9e03305 and a32743b.

📒 Files selected for processing (4)
  • tests/test_rknn_sd_server.py
  • tests/test_routes_models.py
  • tinyagentos/routes/models.py
  • tinyagentos/services/rknn_sd_server.py

Comment on lines +53 to +56
# Idle-unload settings. Set RKNN_SD_IDLE_UNLOAD_S=0 to disable entirely.
_idle_unload_s_raw = os.environ.get("RKNN_SD_IDLE_UNLOAD_S", "600")
IDLE_UNLOAD_THRESHOLD_S: float | None = None if _idle_unload_s_raw.strip() == "0" else float(_idle_unload_s_raw)
IDLE_UNLOAD_INTERVAL_S: float = float(os.environ.get("RKNN_SD_IDLE_CHECK_S", "60"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate idle loop intervals to prevent busy-loop misconfiguration.

RKNN_SD_IDLE_CHECK_S accepts 0/negative values, which can turn Line 208 into a hot loop and spike CPU.

Suggested fix
 _idle_unload_s_raw = os.environ.get("RKNN_SD_IDLE_UNLOAD_S", "600")
 IDLE_UNLOAD_THRESHOLD_S: float | None = None if _idle_unload_s_raw.strip() == "0" else float(_idle_unload_s_raw)
 IDLE_UNLOAD_INTERVAL_S: float = float(os.environ.get("RKNN_SD_IDLE_CHECK_S", "60"))
+if IDLE_UNLOAD_THRESHOLD_S is not None and IDLE_UNLOAD_THRESHOLD_S < 0:
+    raise ValueError("RKNN_SD_IDLE_UNLOAD_S must be >= 0")
+if IDLE_UNLOAD_INTERVAL_S <= 0:
+    raise ValueError("RKNN_SD_IDLE_CHECK_S must be > 0")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/services/rknn_sd_server.py` around lines 53 - 56, The
environment-derived check interval currently assigned to IDLE_UNLOAD_INTERVAL_S
(from RKNN_SD_IDLE_CHECK_S) can be zero or negative and cause a busy loop;
validate and sanitize this value when computing IDLE_UNLOAD_INTERVAL_S by
parsing the env var, ensuring it is a positive non-zero float (or clamp to a
safe minimum like 1.0) and falling back to the existing default ("60") if
invalid; update the assignment logic around IDLE_UNLOAD_INTERVAL_S (and
optionally validate _idle_unload_s_raw/IDLE_UNLOAD_THRESHOLD_S) so negative/zero
values cannot be used to drive the idle loop.

Comment on lines +175 to +182
for method_name in ("release", "close"):
method = getattr(pipe, method_name, None)
if callable(method):
try:
method()
except Exception as exc:
logger.info(f"_unload_pipeline: {method_name}() raised (ignored): {exc}")
gc.collect()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use close() only as a fallback when release() is unavailable.

Current teardown calls both methods if both exist. That can double-release resources.

Suggested fix
-    for method_name in ("release", "close"):
-        method = getattr(pipe, method_name, None)
-        if callable(method):
-            try:
-                method()
-            except Exception as exc:
-                logger.info(f"_unload_pipeline: {method_name}() raised (ignored): {exc}")
+    release = getattr(pipe, "release", None)
+    close = getattr(pipe, "close", None)
+    method_name, method = ("release", release) if callable(release) else ("close", close)
+    if callable(method):
+        try:
+            method()
+        except Exception as exc:
+            logger.info(f"_unload_pipeline: {method_name}() raised (ignored): {exc}")
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 180-180: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/services/rknn_sd_server.py` around lines 175 - 182, The teardown
currently calls both pipe.release() and pipe.close() if both exist which can
double-release resources; change the logic in the loop that iterates over
("release", "close") to first try to call pipe.release() if available and ONLY
call pipe.close() when pipe.release is not present; preserve the existing
try/except behavior and the logger.info message (the f"_unload_pipeline:
{method_name}() raised (ignored): {exc}" line) and keep gc.collect() after the
teardown so resources are collected.

Comment on lines 216 to 222
@app.on_event("startup")
async def _startup():
# Intentionally empty: the pipeline is now lazy-loaded on the first
# /generate request. Old behaviour was to build the full pipeline
# here, which pinned ~5.5 GB of RAM at boot for users who never
# actually used image gen.
# Pipeline is lazy-loaded on the first /generate request.
# Old behaviour was to build here, pinning ~5.5 GB of RAM at boot.
logger.info("rknn_sd_server up — pipeline will lazy-load on first /generate request")
asyncio.create_task(_idle_unload_loop())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep a handle to the idle-unload task and cancel it on shutdown.

Line 221 starts a detached task without lifecycle management. Store it and stop it cleanly to avoid orphaned background work.

Suggested fix
 app = FastAPI(title="RKNN Stable Diffusion", version="0.1.0")
+_idle_task: asyncio.Task | None = None
@@
 `@app.on_event`("startup")
 async def _startup():
@@
-    asyncio.create_task(_idle_unload_loop())
+    global _idle_task
+    _idle_task = asyncio.create_task(_idle_unload_loop())
+
+@app.on_event("shutdown")
+async def _shutdown():
+    global _idle_task
+    if _idle_task is not None:
+        _idle_task.cancel()
+        try:
+            await _idle_task
+        except asyncio.CancelledError:
+            pass
+        _idle_task = None
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 221-221: Store a reference to the return value of asyncio.create_task

(RUF006)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/services/rknn_sd_server.py` around lines 216 - 222, The startup
currently spawns the background loop with asyncio.create_task(...) without
retaining the Task; change _startup to assign that task to a module-level name
(e.g., idle_unload_task) so we keep a handle, add an app.on_event("shutdown")
handler (e.g., _shutdown) that cancels idle_unload_task, awaits it, and handles
asyncio.CancelledError cleanly, and ensure _idle_unload_loop can exit on
cancellation; reference the existing symbols _startup, _idle_unload_loop and use
a module-level idle_unload_task and a new _shutdown to cancel and await the task
on shutdown.

Comment on lines +259 to +271
@app.post("/admin/unload")
async def admin_unload():
"""Manually evict the pipeline from NPU memory.

Returns {ok: true, was_loaded: bool}.

NOTE: the rknn-sd server binds to RKNN_SD_HOST (default 0.0.0.0).
Auth scoping is a follow-up — do not expose this port externally
without a firewall rule or auth middleware.
"""
was_loaded = _pipe is not None
_unload_pipeline()
return {"ok": True, "was_loaded": was_loaded}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Protect POST /admin/unload with authentication/authorization.

This endpoint is currently open and can be abused to repeatedly evict the pipeline.

Minimal hardening option
-from fastapi import FastAPI, HTTPException
+from fastapi import FastAPI, HTTPException, Header
@@
+ADMIN_TOKEN = os.environ.get("RKNN_SD_ADMIN_TOKEN", "").strip()
@@
 `@app.post`("/admin/unload")
-async def admin_unload():
+async def admin_unload(x_admin_token: str | None = Header(default=None)):
@@
+    if not ADMIN_TOKEN or x_admin_token != ADMIN_TOKEN:
+        raise HTTPException(status_code=403, detail="forbidden")
     was_loaded = _pipe is not None
     _unload_pipeline()
     return {"ok": True, "was_loaded": was_loaded}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tinyagentos/services/rknn_sd_server.py` around lines 259 - 271, The POST
/admin/unload endpoint (function admin_unload) currently allows unauthenticated
calls that can evict the pipeline via _unload_pipeline; add an
authentication/authorization check at the top of admin_unload (or as a FastAPI
dependency on app.post) that validates a secret/credential (e.g., an
X-Admin-Token header or a Depends(authenticate_admin) function) using a
timing-safe comparison and returns 401/403 when invalid, then only call
_unload_pipeline() if the check passes; ensure the auth helper is reusable for
other admin endpoints and log unauthorized attempts without exposing secrets.

@jaylfc jaylfc merged commit e44e431 into master May 1, 2026
8 checks passed
@jaylfc jaylfc deleted the feat/image-gen-lifecycle-fixes branch May 1, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant