fix(cluster): strip signing_key from /api/cluster/workers#296
Conversation
…ispatch Adds agent_manual.py with build_manual() — a pure function that returns a per-agent operating manual (≤500 tokens) covering @-mention routing, project task verbs, and lead/non-lead status. The router prepends it as role:system before the conversation history on every chat dispatch so agents always know the channel primitives without per-session briefs.
asdict(WorkerInfo) dumps the worker's raw HMAC signing_key bytes into the response. FastAPI's default jsonable_encoder maps bytes via o.decode() with implicit utf-8 — random key material has bytes outside the utf-8 range, so the entire endpoint 500s. Visible symptom: workers disappear from the cluster widget on the dashboard because the list endpoint never succeeds. Beyond the crash, leaking the HMAC signing key over the API has no legitimate purpose. Strip it from the response. Confirmed on Pi: GET /api/cluster/workers was returning 500 with "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2..." every poll. Fedora worker (192.168.6.108) heartbeats are fine — the bug was purely in the list-side serialization.
📝 WalkthroughWalkthroughThis pull request introduces an agent-specific system-prompt injection feature via a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (5 files)
Reviewed by grok-code-fast-1:optimized:free · 55,589 tokens |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tinyagentos/routes/cluster.py (2)
19-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix mutable default values in
WorkerRegister(shared state across requests).
WorkerRegisterdefines mutable defaults:
hardware: dict = {}(Line 22)backends: list[dict] = [](Line 23)models: list[str] = [](Line 24)capabilities: list[str] = [](Line 25)Even if Pydantic often copies defaults, relying on that is risky: mutable defaults can lead to cross-request data leakage / unexpected mutation when any code mutates these values.
Suggested fix: switch to
Field(default_factory=...)for each mutable field.🛠️ Proposed fix
-from pydantic import BaseModel +from pydantic import BaseModel, Field class WorkerRegister(BaseModel): name: str url: str - hardware: dict = {} - backends: list[dict] = [] - models: list[str] = [] - capabilities: list[str] = [] + hardware: dict = Field(default_factory=dict) + backends: list[dict] = Field(default_factory=list) + models: list[str] = Field(default_factory=list) + capabilities: list[str] = Field(default_factory=list) platform: str = ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tinyagentos/routes/cluster.py` around lines 19 - 33, WorkerRegister uses mutable default values for hardware, backends, models, and capabilities which can cause shared state; update these fields to use Pydantic's Field with default_factory (e.g., Field(default_factory=dict) for hardware and Field(default_factory=list) for backends/models/capabilities) and import Field from pydantic, leaving other fields and kv_* defaults unchanged so each model instance gets its own fresh collection.
66-89:⚠️ Potential issue | 🟠 MajorAPI contract mismatch:
list_workersreturns bare array, but consumer expects wrapped object.The endpoint returns a bare JSON array, but
tinyagentos/scheduling/resource_manager.py(line 112) calls.get("workers", [])expecting an object shape{"workers": [...]}. When the actual list response is received,.get()fails silently due to exception handling and returns[], causing cluster workers to never be discovered during remote resource checks.Fix: In
resource_manager.pyline 112, changeresp.json().get("workers", [])toresp.json()to match the actual endpoint response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tinyagentos/routes/cluster.py` around lines 66 - 89, The remote workers API (list_workers) returns a bare JSON array but the client is calling resp.json().get("workers", []), causing discovery to fail; locate the call that uses resp.json().get("workers", []) (the code that fetches remote cluster workers) and replace that expression with resp.json() so the client treats the response as the list itself (ensure the returned value is assigned to the variable used as the workers list).
🧹 Nitpick comments (1)
tinyagentos/routes/cluster.py (1)
81-87: ⚡ Quick winAvoid mutating worker state during a GET response (unless intentional + safe).
Inside
list_workers, whenregistry is not None, the code both:
- computes
tier_id/potential_capabilitiesfor the response (Lines 82-85), and- mutates in-memory worker state to “keep WorkerInfo fields in sync” (Lines 86-87).
If multiple requests come in concurrently (or other code relies on these fields being updated only during heartbeat/registration), this can cause subtle racey/ordering behavior and makes GET have side effects.
If you want to persist these derived fields, consider moving that update to the heartbeat/registration path; otherwise compute only into
dwithout touchingw.tier_id/w.potential_capabilities.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tinyagentos/routes/cluster.py` around lines 81 - 87, In list_workers, avoid mutating in-memory worker state during a GET: remove the assignments to w.tier_id and w.potential_capabilities so the handler only computes tier_id and pot_caps via _potential_capabilities(hardware, registry) and writes them into the response dict d; if you need to persist these derived fields instead, move the w.tier_id/w.potential_capabilities updates into the worker heartbeat/registration code path (where WorkerInfo is intentionally updated) rather than inside list_workers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tinyagentos/routes/cluster.py`:
- Around line 19-33: WorkerRegister uses mutable default values for hardware,
backends, models, and capabilities which can cause shared state; update these
fields to use Pydantic's Field with default_factory (e.g.,
Field(default_factory=dict) for hardware and Field(default_factory=list) for
backends/models/capabilities) and import Field from pydantic, leaving other
fields and kv_* defaults unchanged so each model instance gets its own fresh
collection.
- Around line 66-89: The remote workers API (list_workers) returns a bare JSON
array but the client is calling resp.json().get("workers", []), causing
discovery to fail; locate the call that uses resp.json().get("workers", []) (the
code that fetches remote cluster workers) and replace that expression with
resp.json() so the client treats the response as the list itself (ensure the
returned value is assigned to the variable used as the workers list).
---
Nitpick comments:
In `@tinyagentos/routes/cluster.py`:
- Around line 81-87: In list_workers, avoid mutating in-memory worker state
during a GET: remove the assignments to w.tier_id and w.potential_capabilities
so the handler only computes tier_id and pot_caps via
_potential_capabilities(hardware, registry) and writes them into the response
dict d; if you need to persist these derived fields instead, move the
w.tier_id/w.potential_capabilities updates into the worker
heartbeat/registration code path (where WorkerInfo is intentionally updated)
rather than inside list_workers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c21cb44a-bd03-4e24-b789-2023963cceb1
📒 Files selected for processing (5)
tests/test_agent_chat_router.pytests/test_agent_manual.pytinyagentos/agent_chat_router.pytinyagentos/agent_manual.pytinyagentos/routes/cluster.py
Summary
Repro (pre-fix)
Test plan
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests