Skip to content

[Spec 755] Multi-architect support: builder→architect message routing#757

Merged
waleedkadous merged 29 commits into
mainfrom
builder/spir-755
May 18, 2026
Merged

[Spec 755] Multi-architect support: builder→architect message routing#757
waleedkadous merged 29 commits into
mainfrom
builder/spir-755

Conversation

@waleedkadous
Copy link
Copy Markdown
Contributor

Summary

Adds multi-architect support to Codev workspaces. A workspace can now host multiple named architect terminals; afx send architect from a builder routes to the architect that spawned it — not to a shared singleton. Single-architect workspaces are byte-for-byte identical to before.

Closes #755

This PR delivers feature #1 from the issue (builder-to-architect message routing, the load-bearing item). Features #2–5 are intentionally deferred to follow-up issues.

Changes

User-facing:

  • New afx workspace add-architect [--name <name>] to register additional named architect terminals in an active workspace.
  • First architect defaults to main; subsequent auto-number architect-2, architect-3; explicit name override via --name.
  • afx send architect from a builder reaches its spawning architect only.

Under the hood (3 phases):

  • Phase 1 — Storage and Tower data-model relaxation (commit fd2ae34d). v9 local migration rebuilds the architect table as id TEXT PRIMARY KEY; v13 global backfill sets terminal_sessions.role_id = 'main' for legacy architect rows. ~12 singleton call sites updated across tower-instances.ts, tower-routes.ts, tower-terminals.ts, tower-tunnel.ts, commands/stop.ts, state.ts, db/migrate.ts. CI guardrail prevents entry.architect (singular) regressions.
  • Phase 2 — Naming CLI and spawn-time identity capture (commit ad16f0cf). New CLI, Tower-side addArchitect, HTTP route, client method. CODEV_ARCHITECT_NAME env-var injection. afx spawn records spawnedByArchitect on every builder.
  • Phase 3 — Affinity-aware routing (commits ff8c2bad + f451dcf4). Widened resolveTarget(target, fallbackWorkspace?, sender?). Single-architect fast path for latency parity. Three security rules (legacy-builder, architect-gone, address-spoofing) with verbatim spec error messages. architect:<name> addressing via dedicated resolveArchitectByName (the parseAddress grammar would otherwise mis-split it).

Documentation:

  • codev/resources/commands/agent-farm.md — new afx workspace add-architect command section; new Environment Variables section.
  • codev/resources/arch.md — new "Multi-architect routing" subsection under the Tower section.
  • codev/resources/lessons-learned.md — five new entries.

Backward compatibility

Single-architect workspaces see byte-for-byte identical behavior:

  • /api/state response shape unchanged (scalar state.architect, populated from main or first registered).
  • loadState() / getArchitect() / setArchitect() preserve their existing contracts; new getArchitects() / getArchitectByName() / setArchitectByName() added for multi-architect callers.
  • Latency fast-path skips the per-workspace state.db read entirely for solo-architect workspaces.
  • Existing single-architect dashboards, VSCode extensions, and CLI flows are unchanged.

Testing

  • 2675 codev tests pass (2604 before this PR; +71 new tests). 13 pre-existing skipped tests unchanged.
  • New test files: spec-755-migration.test.ts (10), spec-755-guardrail.test.ts (1), spec-755-phase2.test.ts (30), spec-755-phase3-routing.test.ts (18), spec-755-lookup-builder.test.ts (6).
  • New cases in existing files: tower-routes.test.ts (+8: 6 architects-route + 2 from-forwarding), state.test.ts (+3: spawnedByArchitect persistence), db.test.ts and migrate.test.ts updated for the new singleton-lifted schema.
  • All migration paths covered: forward migrate from pre-v9 with a singleton row, idempotent re-runs, empty-table no-op, global backfill safety.

Test plan (reviewer)

  • pnpm build clean
  • pnpm --filter @cluesmith/codev test green
  • In a fresh workspace: afx workspace start → confirm architect 'main' starts and appears as before
  • In the same workspace: afx workspace add-architect --name sibling → confirm second architect starts
  • Spawn a builder from each architect terminal, send afx send architect "hi from <name>" — confirm only the spawning architect receives the message
  • Try afx workspace add-architect --name 'Invalid Name' → confirm rejection
  • Try afx workspace add-architect --name main (already registered) → confirm collision rejection

Spec

codev/specs/755-multi-architect-support-per-ar.md

Review

codev/reviews/755-multi-architect-support-per-ar.md

Known follow-ups

waleedkadous and others added 29 commits May 17, 2026 13:51
Incorporates feedback from 3-way consultation (Gemini, Codex, Claude):
- Fix migration text: drop CHECK (id = 1), change id to TEXT PRIMARY KEY.
  state.db is per-workspace, so no workspace_path column.
- Enumerate all singleton homes (not just the original three), including
  terminal_sessions.role_id, tower-instances activation guard, workspace
  stop iterations, tower-tunnel, DashboardState, ArchitectState.
- Explicitly plumb 'from' (sender identity) into resolveTarget; today
  it's dropped at handleSend.
- Pin broadcast syntax to 'architects' (plural) to avoid project:agent
  parser collision.
- Specify architect-gone vs. legacy-builder fallback rules with asserted
  error text; specify architect reconnect behavior.
- Keep /api/state response shape scalar in v1; multi-architect UI
  deferred to issue #2.
Addresses five inline comments from architect review:
- Problem statement: correct that only the singleton receives messages
  today (sibling architect is not Tower-registered); add the manual
  copy-paste workaround as the visible day-to-day cost.
- Naming policy: first architect defaults to 'main'; subsequent
  auto-number 'architect-2', 'architect-3'; explicit override via CLI
  (architect's example: 'afx architect --name <name>'). Pins
  character set and length, adds test scenarios for auto-numbering
  and collision rejection.
- 'architectId' == name: renamed throughout. Builder field is now
  'spawnedByArchitect'; no separate internal ID.
- Drop broadcast scope: architect rejected 'architects' broadcast as
  not required. Removed from In Scope, success criteria, tests,
  security carve-outs. Added 'Explicitly NOT in scope' subsection
  with forward-looking note.
- Resolves the previously-critical 'How does an architect declare
  its identity?' open question.
Three sequential phases:
1. Storage and Tower data-model relaxation (schema migration, in-memory
   collection, all singleton call sites updated, CI guardrail). No
   user-visible change.
2. Naming CLI ('afx architect --name <name>' or equivalent) + spawn-time
   identity capture via CODEV_ARCHITECT_NAME env var. Sends still go
   to main.
3. Affinity-aware routing: plumb 'from' into the resolver, implement
   legacy-builder/architect-gone/spoofing rules with asserted error
   texts. The user-visible win lands last.
Iter-1 consultation: Codex REQUEST_CHANGES (HIGH), Claude COMMENT (HIGH),
Gemini COMMENT (no key issues, HIGH).

Convergent issues addressed:
- afx architect already exists as a non-Tower command — Phase 2 keeps
  it unchanged and introduces 'afx workspace add-architect' as the new
  Tower-aware path. Documented rationale.
- state.ts hardcoded 'WHERE id = 1' in loadState/setArchitect now
  enumerated as specific line-level deliverables in Phase 1.
- Rollback strategy rewritten to match the project's actual
  forward-only _migrations framework (references v3, v4 precedents).

Per-reviewer additions:
- Codex: Phase 3 commits to widening resolveTarget signature (not a
  parallel wrapper); rehydration path in tower-terminals.ts:642
  included alongside create-time paths.
- Claude: migrate.ts:40 hardcoded VALUES(1,...) added to deliverables;
  InstanceStatus.architectUrl gets the same main-first shim;
  annotations.parent_id called out as deferred gap; structurally vs.
  byte-identical /api/state; reference af-architect.test.ts;
  concurrent-spawn risk + better-sqlite3 atomicity mitigation.
- Gemini: preserve DEFAULT (datetime('now')) on started_at; add
  single-architect fast-path in resolver for latency parity; detect
  builder context via state.db row presence (not entry.builders),
  catches completed-builder edge case; use readonly Database handle
  to avoid import cycle (mirrors servers/overview.ts).
…ge and Tower data-model relaxation

Phase 1 of three: turns the architect singleton (in-memory and on-disk)
into a name-keyed collection. No user-visible behavior change — the
default architect is named 'main', the dashboard API stays scalar, and
solo-architect workspaces look identical to before.

Type-level changes:
- ArchitectState.name: string
- Builder.spawnedByArchitect?: string
- WorkspaceTerminals.architect → architects: Map<string, string>
- DbArchitect.id: number → string
- DbBuilder.spawned_by_architect: string | null

Schema and migrations (forward-only, project convention):
- Local v9: rebuild architect table as id TEXT PRIMARY KEY, rekey
  existing row to 'main'. Add builders.spawned_by_architect.
- Global v13: backfill terminal_sessions.role_id='main' for legacy
  type='architect' rows where role_id IS NULL. Idempotent.
- Fresh schema.ts updated to match.
- migrate.ts:40 (JSON→SQLite) inserts VALUES ('main', ...).

state.ts:
- loadState / getArchitect: prefer 'main', else first-registered
  (ORDER BY started_at, not lexicographic id — Codex flag).
- setArchitect: main-only setter, preserved for backward compat.
- New setArchitectByName, getArchitects, getArchitectByName,
  lookupBuilderSpawningArchitect (Phase 3 infrastructure).
- upsertBuilder accepts spawnedByArchitect; COALESCE preserves
  existing value on idempotent re-upserts.

Tower call-site sweep:
- tower-instances.ts: activation guard now 'architects.size === 0';
  every create/teardown path uses the Map; role_id stores 'main' at
  save time (vs the prior null).
- tower-terminals.ts: rehydration path (line 642 area) keys
  architect by dbSession.role_id (falls back to 'main' for legacy
  rows the v13 backfill caught); cleanup exit handlers iterate
  the Map to find by terminalId; single Architect tab is emitted
  post-loop iff freshEntry.architects.size > 0 (Codex flag).
- tower-routes.ts: /api/state preserves scalar shape, populated
  with 'main' or first-registered; workspace stop iterates all
  architects.
- tower-tunnel.ts: terminalToWorkspace map iterates all architects.
- tower-messages.ts: resolveAgentInWorkspace returns the 'main'
  architect (or first), preserving today's single-architect
  routing semantics. Phase 3 adds sender-affinity.
- commands/stop.ts: legacy fallback iterates getArchitects() to
  clean up every named architect (Gemini flag).

Tests:
- spec-755-migration.test.ts: 9 tests covering local v9 rebuild,
  preserved DEFAULT (datetime('now')) on started_at, multi-row
  support after migration, spawned_by_architect column addition,
  empty-table no-op, global v13 backfill (target, untouched rows,
  no-overwrite, idempotent), and the started_at-vs-lexicographic
  fallback regression test.
- spec-755-guardrail.test.ts: walks production source and fails
  if any future contributor writes entry.architect (singular).
- Updated 7 existing test files to the new WorkspaceTerminals.architects
  Map shape.

All 2615 codev tests pass; porch build + tests criteria green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…awn-time identity capture

Phase 2 of three: adds the user-facing CLI for registering additional
named architect terminals, wires CODEV_ARCHITECT_NAME env-var injection
into every architect terminal Tower starts, and threads spawned_by_architect
through afx spawn into the builders row. Sends to 'architect' still land on
'main' / first-registered for now — Phase 3 turns on the affinity-aware
routing.

New CLI surface:
- afx workspace add-architect [--name <name>]
  - --name [a-z][a-z0-9-]*, max 64 chars. Rejects empty/whitespace
    explicitly. Without --name, auto-numbers via smallest-unused-integer
    (architect-2, architect-3, ...).
- Existing 'afx architect' (local-only, no Tower) is unchanged — the
  no-Tower contract is preserved.

New library code:
- packages/codev/src/agent-farm/utils/architect-name.ts —
  pure helpers: validateArchitectName, autoNumberArchitectName,
  DEFAULT_ARCHITECT_NAME='main', MAX_ARCHITECT_NAME_LENGTH=64.
- packages/codev/src/agent-farm/commands/workspace-add-architect.ts —
  CLI handler. Client-side validation, fail-fast on empty name.

Tower changes:
- tower-instances.ts:
  - CODEV_ARCHITECT_NAME injected into the workspace-start architect
    env (defaults to 'main') for both shellper and fallback paths.
  - New addArchitect(workspacePath, name?) function: validates name,
    rejects empty/collision, auto-numbers when omitted, creates PTY
    with CODEV_ARCHITECT_NAME injected. Persists to entry.architects,
    terminal_sessions, and the local state.db architect table.
  - Exit handlers clean up entry.architects AND the local state.db row.
  - launchInstance now also persists 'main' to the local state.db
    (fixes pre-existing inconsistency: setArchitect was never called
    in production code, so commands/status.ts and stop.ts read an
    empty table — Codex's review caught this).
- tower-routes.ts:
  - POST /api/workspaces/:encodedPath/architects — accepts optional
    { name } body. 200 on success; 404 if workspace not running;
    400 on validation error; 405 on non-POST.
- tower-client.ts:
  - addArchitect(workspacePath, name?) client method. Distinguishes
    undefined (auto-number) from explicit '' (rejected server-side).

afx spawn:
- Reads CODEV_ARCHITECT_NAME at module load (falls back to 'main' for
  unset/empty/whitespace). All six upsertBuilder call sites pass
  spawnedByArchitect: SPAWNING_ARCHITECT_NAME.

Tests:
- spec-755-phase2.test.ts: 30 cases covering name validation
  (positive/negative/edge), auto-numbering (gaps/contiguous/custom-name
  skipping/Map.keys iterable), env-var detection
  (unset/empty/whitespace/explicit/trimmed/auto-numbered), and CLI
  empty-name rejection contract.
- tower-routes.test.ts: 6 new cases for the architects route
  (success, auto-number, 404 workspace-not-running, 400 collision,
  405 non-POST, 400 bad-encoding).
- state.test.ts: 3 new cases on upsertBuilder
  (persists spawnedByArchitect; preserves it across re-upserts via
  COALESCE; legacy-upsert leaves null).

Documentation:
- codev/resources/commands/agent-farm.md: new 'afx workspace
  add-architect' command section; new Environment Variables section
  documenting CODEV_ARCHITECT_NAME and TOWER_ARCHITECT_CMD; updated
  'afx workspace stop' description to mention sibling-architect
  teardown.

All 2643 codev tests pass; porch build + tests criteria green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itect routing

Phase 3 of three: makes 'afx send architect' from a builder reach ONLY
the architect that spawned it, never sibling architects sharing the
workspace. This is the user-visible win the spec set out to deliver.

Resolver changes:
- resolveTarget(target, fallbackWorkspace?, sender?). Optional sender
  is plumbed from handleSend() (where it always arrived but was dropped
  before resolution).
- New resolveArchitectByName(name, ws, sender) handles 'architect:<name>'
  addresses as per-architect addresses within the workspace, not as
  project:agent cross-workspace addresses. parseAddress can't distinguish
  these (the grammar is overloaded), so resolveTarget intercepts when
  project === 'architect' before findWorkspaceByBasename runs. This is
  the path that enforces the spoofing check.
- resolveAgentInWorkspace gains a fast path: single-architect workspaces
  with the default 'main' name return immediately without touching
  state.db (latency parity for solo-architect users, per Gemini's plan
  optimization).
- Builder context is detected via state.db row presence (three-valued
  return: string | null | undefined). Legacy builders (null row) fall
  back to 'main' if present; architect-gone (target name no longer
  registered) likewise falls back to 'main'; both error with verbatim
  messages if 'main' is absent.
- Per-workspace readonly DB handle in lookupBuilderSpawningArchitect
  (mirrors servers/overview.ts) — Tower can't use the singleton getDb()
  because it serves N workspaces.

Error messages exported as constant functions for single-source-of-truth
testing:
  legacyBuilderErrorMessage(builderId, registered)
  architectGoneErrorMessage(builderId, missingArchitect, registered)
  addressSpoofingErrorMessage(builderId)
All match spec verbatim text (lowercase, no quotes around IDs, no
trailing periods).

Tests (spec-755-phase3-routing.test.ts, 18 cases):
- Single-architect baseline (fast path; state.db read skipped).
- Two architects scoped: main->main, sibling->sibling.
- Legacy builder fallback (main present / main absent + verbatim error).
- Architect-gone fallback (main present / main absent + verbatim error).
- Architect reconnect (route by name, not stale terminalId).
- Non-builder sender → main.
- 'architect:<name>' allowed when matches sender's spawning architect.
- 'architect:<name>' rejected when mismatched (verbatim error).
- 'architect:<name>' allowed for non-builder senders.
- 'architect:<name>' NOT_FOUND for unknown names.
- 'architect:<name>' NO_CONTEXT without workspace context.
- Fast-path edge case: single architect with non-default name does NOT
  fast-path (state.db read still occurs).

All 2667 codev tests pass; porch build + tests criteria green.

Phase 3 closes Spec 755 v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ge gaps

Iter-2 consult: Gemini APPROVE, Claude APPROVE, Codex REQUEST_CHANGES on
missing test coverage. Addressed two of three:

1. /api/send forwards `from` to resolveTarget
   tower-routes.test.ts: two new cases in POST /api/send block assert
   that resolveTarget is invoked with the `from` value (sender for
   affinity routing) and with undefined for non-builder sends. Pins
   the handleSend → resolveTarget plumbing at the route-handler layer.

2. lookupBuilderSpawningArchitect direct tests
   spec-755-lookup-builder.test.ts: 6 cases against real SQLite that
   exercise the per-workspace readonly handle:
   - Explicit spawned_by_architect → returns string.
   - Legacy NULL row → returns null.
   - No row → returns undefined.
   - Missing state.db file → returns undefined (graceful).
   - Cross-workspace isolation — same builder ID in two workspaces
     returns the right architect for each. This is the bug Gemini
     caught in iter-1 (singleton getDb would have collapsed both).
   - 50 consecutive lookups don't leak DB handles.

3. Full e2e Tower-process test for affinity routing — deferred,
   documented in iter-2 rebuttal. Cost-benefit: route + resolver +
   DB layers are each individually covered; the residual gap is
   mechanical wiring.

All 2675 codev tests pass; porch build + tests criteria green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review document at codev/reviews/755-multi-architect-support-per-ar.md
captures: spec compliance against all 13 success criteria, three plan
deviations with rationale, per-phase summaries, lessons learned, all
consultation rounds (spec iter-1 + iter-2-architect; plan iter-1;
phase 1/2/3 iter-1; phase 3 iter-2) with response disposition, flaky
tests (none), follow-up items, and detailed architecture / lessons
updates.

arch.md gains a "Multi-architect routing (Spec 755)" subsection under
the Tower section. Documents:
- name-keyed WorkspaceTerminals.architects invariant
- four-layer routing chain (CLI -> handleSend -> resolveTarget -> map)
- lookupBuilderSpawningArchitect per-workspace readonly pattern
- backward-compat invariants (/api/state scalar shape)
- the spec-755-guardrail.test.ts CI safeguard
Plus the new POST /api/workspaces/:enc/architects row in the API table.

lessons-learned.md gains five entries in Process / Testing:
- Vestigial production code can survive for unknown durations
  (setArchitect was orphaned; only tests called it).
- Plan migration version numbers should be verified or referenced by
  purpose, not by fixed integer.
- Export error messages as functions for verbatim-text-asserting tests.
- Test the public address grammar, not internal resolver shape.
- CI guardrail tests are cheap insurance for sweep-style refactors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waleedkadous
Copy link
Copy Markdown
Contributor Author

Architect approval — 5 rounds of CMAP across spec/plan/3 phases caught real bugs and the builder addressed each honestly. The deferred e2e test is documented technical debt with explicit cost-benefit rationale; layer-level coverage is strong and a wiring regression would still surface in existing tests. Filing a follow-up issue for the e2e add. Approved — please wait for CI green, then merge with gh pr merge 757 --merge --admin.

@waleedkadous waleedkadous merged commit 1111672 into main May 18, 2026
6 checks passed
waleedkadous added a commit that referenced this pull request May 18, 2026
waleedkadous added a commit that referenced this pull request May 18, 2026
waleedkadous added a commit that referenced this pull request May 18, 2026
…uction 2026-05-18: PR #757 already merged at commit 1111672 after 5 prior CMAP rounds across 3 phases; manual smoke testing in production deferred to follow-up issues #2-5 from issue #755. Forge tool fix for pr-search merged-PR lookup is filed separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-architect support: per-architect identity + builder-to-architect message routing

1 participant