Skip to content

Daemon + thin client architecture#22

Merged
Aaronontheweb merged 6 commits into
devfrom
feature/daemon-thin-client-architecture
Feb 24, 2026
Merged

Daemon + thin client architecture#22
Aaronontheweb merged 6 commits into
devfrom
feature/daemon-thin-client-architecture

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Collaborator

Summary

Splits Netclaw.App into two binaries and wires them together:

  • netclawd (Netclaw.Daemon) — always-on daemon hosting the Akka actor system, LLM sessions, tool execution, and persistence. Exposes a functional SignalR hub at /hub/session for remote session access.
  • netclaw (Netclaw.Cli) — thin CLI client with daemon management (start/stop/status/install/uninstall) and transitional in-process chat mode.

Key changes

  • Project split: Netclaw.AppNetclaw.Daemon + Netclaw.Cli with shared Netclaw.Configuration
  • SessionHub wired to SessionPipeline: SessionRegistry (singleton) bridges SignalR hub (transient) to Akka.Streams via IHubContext<SessionHub, ISessionHubClient> — creates sessions, accepts messages, streams output
  • Daemon management: DaemonManager handles process lifecycle — binary discovery, PID file tracking, graceful SIGTERM shutdown, systemd user service registration
  • Review hardening: port conflict fix (CLI uses random port), process name validation before SIGTERM, kill() return checking, output sink error logging, OfferAsync result checking, duplicate session guard
  • README: updated with architecture overview and CLI reference

Integration tested

netclaw daemon start          → Daemon started (PID 1229667).
netclaw daemon status         → Daemon running (PID 1229667, uptime 0m 10s).
curl health endpoint          → "healthy"
curl SignalR negotiate        → connectionId + available transports
netclaw daemon start (again)  → Daemon already running (PID 1229667).
netclaw daemon stop           → Daemon stopped (was PID 1229667).
netclaw daemon status         → Daemon is not running.

Test plan

  • dotnet build Netclaw.slnx — 0 errors, 0 warnings
  • dotnet test Netclaw.slnx — 110 passed
  • dotnet slopwatch analyze — 0 issues
  • Daemon lifecycle: start → status → health → negotiate → duplicate guard → stop → status
  • Windows cross-platform daemon management (tracked in Daemon management: Windows cross-platform support #21)

Update PRDs (001, 004, 009), SPEC-011, and IMPLEMENTATION_PLAN to reflect
the OpenClaw-pattern architecture: persistent daemon (Netclaw.Daemon) with
thin CLI/TUI clients (Netclaw.Cli) connecting via SignalR.

Key changes:
- Two binaries: daemon owns Akka/persistence/tools, CLI is lightweight
- TUI chat becomes pure thin client rendering SessionOutput over SignalR
- CLI commands categorized as offline vs daemon-required
- Daemon management commands: start/stop/status/install/uninstall
- systemd user service registration (no sudo)
- New implementation tasks 1.26-1.31 for the split
- Tasks 1.10-1.12 marked as done (PRs #14, #16, #17)
Structural split following the daemon + thin client pattern:

- Netclaw.Daemon (netclawd): Web SDK, actor system, SignalR hub,
  config watcher, health endpoint
- Netclaw.Cli (netclaw): CLI routing (chat, -p, init, doctor,
  daemon stub, config stub), TUI, headless channel

Shared types moved to library projects:
- Config POCOs (ProviderEntry, ModelSelection, ModelReference) →
  Netclaw.Configuration
- SessionOutputDto → Netclaw.Actors.Protocol

CLI temporarily keeps full Akka stack in-process with transitional
copies of ChatClientFactory/NetclawChatClientProvider. Task 1.28
refactors to SignalR client and removes Akka dependencies.
Replace the stub SessionHub with a functional implementation that
creates sessions, accepts messages, and streams output back to
SignalR callers. SessionRegistry (singleton) owns session state and
uses IHubContext to push output from Akka.Streams callbacks without
requiring a hub instance.
DaemonManager handles the full lifecycle: binary discovery, detached
process spawning with PID file tracking, graceful SIGTERM shutdown on
Linux/macOS, and systemd user service registration. Windows service
support tracked in #21.
Review fixes:
- CLI port conflict: use port 0 (random) for transitional in-process mode
- SessionHub: call base.OnDisconnectedAsync for framework cleanup
- SessionRegistry: log output delivery errors instead of swallowing,
  guard against duplicate CreateSession per connection, check OfferAsync
  result for queue failures
- DaemonManager: don't redirect stdio (prevents SIGPIPE / pipe deadlock),
  validate process name before SIGTERM to prevent PID recycling kills,
  check kill() return value, proper DateTimeOffset for uptime calc
- Eliminate duplicate NetclawPaths instances in both Program.cs files

README updated to reflect daemon + thin client architecture with CLI
reference documentation.
Copy link
Copy Markdown
Collaborator Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by corrected review comment below. Please ignore this malformed one.

Copy link
Copy Markdown
Collaborator Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected review notes (prior comment had shell-escaping artifacts):

  1. netclaw daemon stop can target unrelated dotnet processes because PID validation currently accepts any process named dotnet without verifying command-line identity.
  2. SessionHub.SendMessage does not enforce connection ownership of sessionId, so a client that learns another session ID can inject messages.
  3. Session disconnect currently disposes sessions immediately, which conflicts with the reconnect model in SPEC-011 (daemon keeps session alive for reconnection).
  4. PID ownership is launcher-side only; with systemd Restart=always, PID can drift after daemon restarts and status/stop become unreliable.
  5. Daemon subcommands return error text but still exit code 0, which makes automation/scripts unreliable.

Proposed fixes:

  • Harden daemon PID validation (including command-line identity checks for dotnet-hosted daemon) and validate process identity in status.
  • Enforce per-connection session ownership in hub/registry and add explicit session re-attach flow for reconnects.
  • Move PID file authority into the daemon runtime via hosted service so restarts keep PID accurate.
  • Return non-zero exit codes for daemon command failures.

Enforce connection-scoped SignalR sessions with typed IDs, preserve sessions for reconnect, and tighten daemon PID/process validation for safer lifecycle commands. Add targeted tests and daemon PID file ownership to keep status/stop behavior reliable.
@Aaronontheweb Aaronontheweb merged commit 86e26da into dev Feb 24, 2026
3 checks passed
@Aaronontheweb Aaronontheweb deleted the feature/daemon-thin-client-architecture branch February 24, 2026 05:49
Aaronontheweb added a commit that referenced this pull request May 22, 2026
* fix(security): address open CodeQL alerts

- workflows: add `permissions: contents: read` to website-rebuild.yml
  so the rebuild job no longer inherits the repo's default GITHUB_TOKEN
  scope (cs-actions/missing-workflow-permissions, alert #21).
- webhooks: sanitize the raw route value via SanitizeWebhookId before
  logging the route_not_found case — ASP.NET URL-decodes path segments,
  so a caller could otherwise inject CR/LF into log lines
  (cs/log-forging, alert #22).
- lifecycle: add a SanitizeReason helper to DaemonLifecycleNotifier that
  strips control chars and caps length at 200, and apply it in
  NotifyShutdown (HTTP-tainted via ShutdownDaemonRequest.Reason) and
  NotifyCrashing (cs/log-forging, alert #18).
- reminders: harden ReminderDefinitionStore.GetPath with an explicit
  base-directory containment check on top of the existing
  Uri.EscapeDataString encoding, and add a regression test covering
  traversal, backslash, and absolute-path ids (cs/path-injection,
  alert #17).

* fix(security): tighten log/telemetry sanitization from review

Follow-ups from the post-CodeQL code review:

- webhooks: sanitize `route` once and pass the safe value to both
  WebhookTelemetry.RecordRouteNotFound and the log line, not just the log.
  The metric tag was the same log-forging surface and additionally an
  unbounded high-cardinality vector against the `route` dimension.
- lifecycle: widen the SanitizeReason predicate from `char.IsControl` to
  also strip U+2028 (LINE SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR);
  these are categories Zl/Zp (not Cc) so IsControl misses them, but JSON-
  line readers and many log shippers still split on them.
- lifecycle: strip sanitized chars rather than replacing them with space,
  so an attacker-controlled `reason=ok\nlevel=critical` collapses to
  `reason=oklevel=critical` instead of `reason=ok level=critical` — a
  space would have been a plausible field separator to key=value log
  parsers.
- lifecycle: when truncating reason at 200 chars, back off one position
  if the cut would orphan a high surrogate, so downstream UTF-8 encoders
  don't emit U+FFFD or throw.
- tests: cover the four behaviors above with a Theory across CR/LF, NUL,
  U+2028, U+2029, and a surrogate-pair-boundary truncation case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant