Skip to content

Multi-architect feature is underbaked — gaps in lifecycle, persistence, and UX #786

@waleedkadous

Description

@waleedkadous

Summary

The multi-architect feature (sibling architects added via afx workspace add-architect) shipped its primitive in v3.0.5 (#755), dashboard tabs in v3.0.6 (#761), and a critical routing fix in v3.0.8 (#774). But the feature is not yet a coherent, well-thought-through product. There are gaps in lifecycle management, persistence semantics, and UX that an end user discovers immediately when they try to drive it.

This issue is the umbrella for a SPIR-protocol pass to design and ship the missing pieces as a cohesive unit. Goal: by the end, a user can add, manage, evict, and recover sibling architects with the same fluency they have with builders.

Confirmed gaps

1. No way to remove a sibling architect

2. Right-pane terminals (builders, shells) have no close button; sibling architects similarly cannot be closed from where they live

  • The dashboard right pane (where builders and shells render) has no X / close affordance on tabs. Only the left pane does.
  • Sibling architect tabs (in the multi-architect tab strip introduced in v3.0.6 Surface multiple architects in Tower dashboard + VS Code extension + afx status #761) likewise have no close UI. The architect is a closable entity (unlike main, which is workspace-defining), so it should have one.
  • This is a broader UX gap that affects more than architects, but architects make it salient.

3. Sibling architects are not persisted in state.db

  • Confirmed empirically: shannon's .agent-farm/state.db architect table is empty even though Tower's in-memory getWorkspaceTerminals() map correctly has both main and ob-refine.
  • Implication: sibling architects exist only in Tower's process memory. A Tower restart wipes them. The user has to re-run add-architect every time Tower goes down.
  • This is the opposite design from how builders work (builders persist across Tower restarts via state.db + shellper auto-rebind).
  • There IS an architect table in the schema (with id, pid, port, cmd, started_at, terminal_id) — the row for main gets written by workspace start, but sibling architects added via add-architect never write rows. So the table exists but the code path skips it for siblings.
  • Question to settle in spec: should siblings persist? If yes, the auto-rebind story needs to mirror builders (Tower restart → re-spawn the architect terminal from the recorded cmd, re-register against the rebound shellper). If no, siblings are explicitly ephemeral and the docs should say so loudly.

4. Routing was broken end-to-end from v3.0.5 → v3.0.7 (fixed in #774, pending v3.0.9 publish)

  • Not a new gap — but a symptom of the underlying problem: the headline value prop of the feature ("builder→architect message routes to the spawning sibling") was never exercised end-to-end before shipping.
  • The spec for the umbrella SPIR must include verify-phase steps that literally run afx send architect from a builder spawned by a sibling, and assert the message lands on the sibling's terminal.

Probable additional gaps (for the spec phase to confirm)

  • Recovery from a crashed sibling architect. If ob-refine's Claude process exits, what state are its in-flight builders in? Do they detect the gone-architect and surface to main? The current fallback chain in tower-messages.ts:332-341 routes to main when the spawning architect is gone — but the in-memory map entry might be stale (terminal_id pointing at a dead PID). Spec should pin behaviour.
  • Naming constraints. A name like ob-refine works; what about main (reserved?), empty string, names with spaces, names with : (collides with the architect:<name> address grammar)? Validation needs to be documented.
  • Cross-architect message addressing. v3.0.5 introduced architect:<name> as an in-workspace address. What about messaging from architect-to-architect? Does main send to ob-refine via architect:ob-refine? Verify this works and document it.
  • VSCode extension surface. The VSCode sidebar shows one Architect tab. With siblings, what does it show? Is there parity with the dashboard's tab strip?
  • afx status output. When there are siblings, does the CLI list them with their PIDs / terminal IDs the way it does for main? Currently afx status mostly elides them.
  • Dashboard tab labelling. The first architect is id: 'architect' (bare) and siblings are architect:<name> (per v3.0.6 spec). When main is the bare one and there are siblings, is the labelling consistent / discoverable?

Out of scope for this SPIR

  • Multi-architect-driven workflows beyond the single workspace (cross-workspace routing was deferred earlier; that stays deferred).
  • Renaming architects after add. Suggest filing as a separate small ticket if needed.

Suggested approach

Run as SPIR (strict mode) — feature is large enough to warrant spec-approval + plan-approval gates, and the design choices (persistence model, lifecycle semantics) deserve careful review. The architect (human) should drive the design conversation at the spec gate.

Verify-phase exit criteria must include the manual round-trip test that v3.0.5 lacked: a sibling architect, add it, spawn a builder from it, send a message to architect, observe it lands on the sibling. Repeat for remove-architect, crash-recovery, and persistence-across-Tower-restart paths.

Severity / priority

Medium-high. The feature was promoted as a v3.0.6 headline, an external adopter (Shannon) is using it in production, and it's missing the basic lifecycle / UX hygiene that the rest of Codev provides. Not a blocker for shipping v3.0.9 (which fixes the most painful bug, #774), but the next coherent release should land this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions