diff --git a/.cursor/mcp.json b/.cursor/mcp.json deleted file mode 100644 index 19951a368..000000000 --- a/.cursor/mcp.json +++ /dev/null @@ -1,14 +0,0 @@ -{ - "mcpServers": { - "relaycast": { - "command": "npx", - "args": [ - "-y", - "@relaycast/mcp" - ], - "env": { - "RELAY_BASE_URL": "https://api.relaycast.dev" - } - } - } -} diff --git a/.cursor/settings.json b/.cursor/settings.json deleted file mode 100644 index 1cc52554d..000000000 --- a/.cursor/settings.json +++ /dev/null @@ -1,7 +0,0 @@ -{ - "permissions": { - "allow": [ - "mcp__agent-relay__*" - ] - } -} diff --git a/.factory/settings.json b/.factory/settings.json deleted file mode 100644 index 565f14af7..000000000 --- a/.factory/settings.json +++ /dev/null @@ -1,5 +0,0 @@ -{ - "enabledPlugins": { - "core@factory-plugins": true - } -} \ No newline at end of file diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md deleted file mode 100644 index f416a9f92..000000000 --- a/ARCHITECTURE.md +++ /dev/null @@ -1,738 +0,0 @@ -# Agent Relay: Architecture & Design Document - -## Executive Summary - -Agent Relay is a real-time messaging system that enables autonomous agent-to-agent communication. It allows AI coding assistants (Claude, Codex, Gemini, etc.) running in separate terminal sessions to discover each other and exchange messages without human intervention. - -The system works by: - -1. Wrapping agent CLI processes in PTY sessions managed by a Rust broker -2. Providing MCP tools for agent communication (mcp__relaycast__message_dm_send, mcp__relaycast__agent_add, etc.) -3. Routing messages through Relaycast (cloud WebSocket service) -4. Injecting incoming messages directly into agent terminal input - -This document provides complete transparency into how the system works, its design decisions, limitations, and trade-offs. - ---- - -## Table of Contents - -1. [System Overview](#1-system-overview) -2. [Architecture Layers](#2-architecture-layers) -3. [Component Deep Dive](#3-component-deep-dive) -4. [Protocol Specification](#4-protocol-specification) -5. [Message Flow](#5-message-flow) -6. [Data Storage](#7-data-storage) -7. [Security Model](#8-security-model) -8. [Design Decisions & Trade-offs](#9-design-decisions--trade-offs) -9. [Known Limitations](#10-known-limitations) -10. [Future Considerations](#11-future-considerations) - ---- - -## 1. System Overview - -### 1.1 Problem Statement - -Modern AI coding assistants operate in isolation. When you run multiple agents on different parts of a codebase, they cannot: - -- Share discoveries or context -- Coordinate on interdependent tasks -- Request help from specialized agents -- Avoid duplicate work - -Agent Relay solves this by providing a communication layer that requires **zero modification** to the underlying AI systems. - -### 1.2 Core Principle: MCP Tool Protocol - -The fundamental insight is that AI agents can invoke MCP (Model Context Protocol) tools. By providing relay tools (`mcp__relaycast__message_dm_send`, `mcp__relaycast__agent_add`, `mcp__relaycast__agent_list`, etc.) via MCP, agents can communicate without modifying the underlying AI system. - -This approach: - -- Works with any CLI-based agent that supports MCP -- Requires no agent-side code changes -- Preserves the user's normal terminal experience -- Allows agents to communicate using natural language - -### 1.3 High-Level Architecture - -``` -┌─────────────────────────────────────────────────────────────────────────┐ -│ User's Terminal │ -│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ -│ │ agent-relay │ │ agent-relay │ │ agent-relay │ │ -│ │ spawn Alice │ │ spawn Bob │ │ spawn Carol │ │ -│ │ claude │ │ codex │ │ gemini │ │ -│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ -│ │ │ │ │ -│ │ PTY Sessions │ PTY Sessions │ PTY Sessions │ -│ │ │ │ │ -│ └────────────────────┼────────────────────┘ │ -│ │ │ -│ ┌───────────▼───────────┐ │ -│ │ Broker (Rust) │ │ -│ │ agent-relay-broker │ │ -│ └───────────┬───────────┘ │ -│ │ │ -│ ┌───────────▼───────────┐ │ -│ │ Relaycast Cloud │ │ -│ │ (WebSocket) │ │ -│ └───────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 2. Architecture Layers - -The system is organized into five distinct layers: - -### Layer 1: CLI Interface (`src/cli/`) - -Entry point for users. Parses commands, manages broker lifecycle, handles agent spawning and messaging. - -### Layer 2: Broker (`src/main.rs` + `src/lib.rs`) - -Rust binary that manages PTY sessions, parses agent output, routes messages via Relaycast WebSocket, and handles agent lifecycle. - -### Layer 3: SDK (`packages/sdk/`) - -TypeScript SDK for programmatic access. Drives the broker binary over stdio, provides spawn/release/event APIs. - -### Layer 4: Storage (`packages/storage/`) - -Message persistence using JSONL format. Supports queries by sender/recipient/time. - -### Layer 5: Dashboard (`packages/dashboard/`) - -Web UI for monitoring. Shows connected agents, message flow, real-time updates. - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Layer 1: CLI │ -│ ┌─────────────────────────────────────────────────────────────┐│ -│ │ Commands: up, down, status, spawn, bridge, doctor ││ -│ └─────────────────────────────────────────────────────────────┘│ -├─────────────────────────────────────────────────────────────────┤ -│ Layer 2: Broker (Rust) │ -│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ -│ │ PTY Manager │ │ MCP Tools │ │ Relaycast WS │ │ -│ │ (Agent mgmt) │ │ (send_dm) │ │ (Routing) │ │ -│ └───────────────┘ └───────────────┘ └───────────────┘ │ -├─────────────────────────────────────────────────────────────────┤ -│ Layer 3: SDK │ -│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ -│ │ Client │ │ Workflows │ │ Relay Adapter │ │ -│ │ (Stdio I/O) │ │ (DAG runner) │ │ (High-level) │ │ -│ └───────────────┘ └───────────────┘ └───────────────┘ │ -├─────────────────────────────────────────────────────────────────┤ -│ Layer 4: Storage │ -│ ┌───────────────┐ ┌───────────────┐ │ -│ │ Adapter │ │ JSONL │ │ -│ │ (Interface) │ │ (Persistence) │ │ -│ └───────────────┘ └───────────────┘ │ -├─────────────────────────────────────────────────────────────────┤ -│ Layer 5: Dashboard │ -│ ┌───────────────┐ ┌───────────────┐ │ -│ │ Next.js │ │ WebSocket │ │ -│ │ (REST API) │ │ (Real-time) │ │ -│ └───────────────┘ └───────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ -``` - ---- - -## 3. Component Deep Dive - -### 3.1 Broker (`src/main.rs`) - -The broker is a Rust binary (`agent-relay-broker`) that serves as the core runtime. It has several subcommands: - -- **`init`** — Starts as a broker hub, connecting to Relaycast and managing spawned agents via stdio protocol. Supports `--api-port ` to start an HTTP API for dashboard proxy (spawn/release/list endpoints). -- **`pty`** — Wraps a single CLI in a PTY session with message injection -- **`headless`** — Runs a provider (Claude, etc.) in headless/API mode -- **`wrap`** — Internal command used by the SDK to wrap a CLI in a PTY with passthrough - -#### PTY Session Management - -The broker uses native PTY sessions (via `portable-pty`) instead of tmux: - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Broker Process │ -│ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ PTY Session │ │ -│ │ ┌────────────────────────────────────────────────────┐ │ │ -│ │ │ Agent Process (claude, etc.) │ │ │ -│ │ │ │ │ │ -│ │ │ Output: "I'll send a message to Bob" │ │ │ -│ │ │ MCP call: mcp__relaycast__message_dm_send(to: "Bob", text: "...")│ │ │ -│ │ │ │ │ │ -│ │ └────────────────────────────────────────────────────┘ │ │ -│ └──────────────────────────────────────────────────────────┘ │ -│ │ │ -│ │ PTY output streaming │ -│ ▼ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ MCP Tool Handler │ │ -│ │ - Process MCP tool invocations from agents │ │ -│ │ - Parse send_dm, agent_add, etc. │ │ -│ │ - Deduplicate (hash-based) │ │ -│ └──────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ Relaycast WebSocket │ │ -│ │ - Send message to Relaycast cloud │ │ -│ │ - Receive messages from other agents │ │ -│ │ - Handle workspace authentication │ │ -│ └──────────────────────────────────────────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ Message Injection │ │ -│ │ - Wait for agent idle (configurable threshold) │ │ -│ │ - Write to PTY stdin: "Relay message from X [id]: ..." │ │ -│ │ - Press Enter │ │ -│ └──────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────┘ -``` - -#### Key Implementation Details - -**1. PTY-Based Agent Wrapping** -The broker uses `portable-pty` for cross-platform PTY management, replacing the previous tmux-based approach. This eliminates the tmux dependency and provides more direct control over agent I/O. - -**2. ANSI Stripping** -Output is stripped of ANSI escape codes before pattern matching to handle terminal formatting. - -**3. MCP Tool Protocol** -Agents communicate by invoking MCP tools (e.g., `mcp__relaycast__message_dm_send`, `mcp__relaycast__agent_add`, `mcp__relaycast__agent_list`). The broker processes these tool calls and routes messages accordingly. - -**4. Message Deduplication** -Uses a hash-based dedup cache to prevent re-sending the same message: - -```rust -let dedup = DedupCache::new(); -// Messages are hashed and checked before routing -``` - -**5. Idle Detection for Injection** -Configurable idle threshold (default 30s) before injecting messages. The broker monitors agent output and waits for silence before delivering incoming messages. - -**6. CLI-Specific Handling** -Different CLIs need different injection strategies. The broker handles CLI-specific quirks for Claude, Codex, Gemini, Aider, and Goose. - -### 3.2 SDK (`packages/sdk/`) - -The TypeScript SDK provides programmatic access to the broker: - -```typescript -import { AgentRelayClient } from '@agent-relay/sdk'; - -// Start broker and connect -const client = await AgentRelayClient.start({ env: process.env }); - -// Spawn agents in PTY sessions -await client.spawnPty({ name: 'Worker', cli: 'claude', channels: ['general'] }); - -// Listen for events -client.on('event', (event) => console.log(event)); - -// Clean up -await client.release('Worker'); -await client.shutdown(); -``` - -The SDK communicates with the broker via stdio using a JSON-based request/response protocol. - -#### High-Level API (`AgentRelay`) - -```typescript -import { AgentRelay } from '@agent-relay/sdk'; - -const relay = new AgentRelay(); - -// Idle detection -relay.onAgentIdle = ({ name, idleSecs }) => { - console.log(`${name} idle for ${idleSecs}s`); -}; - -const agent = await relay.spawnPty({ - name: 'Worker', - cli: 'claude', - channels: ['general'], - idleThresholdSecs: 30, -}); - -await agent.waitForIdle(120_000); -await relay.shutdown(); -``` - -### 3.3 Relaycast Cloud - -Messages are routed through Relaycast, a cloud WebSocket service: - -- Workspace-based isolation (each project gets a workspace) -- Agent registration and presence -- Channel-based messaging -- Direct messages and threading -- Persistent message history - -### 3.4 Workflow Engine (`packages/sdk/src/workflows/`) - -The SDK includes a DAG-based workflow runner for multi-step agent coordination: - -- Define workflows as YAML templates or programmatically via `WorkflowBuilder` -- Steps can have dependencies, creating a directed acyclic graph -- Built-in templates for common patterns: code review, bug fix, feature development -- Step output chaining via `{{steps.X.output}}` template syntax - ---- - -## 4. Protocol Specification - -### 4.1 MCP Tool Protocol - -Agents communicate by invoking MCP tools provided by the Relaycast MCP server: - -| Tool | Description | -| --------------------------------------------- | ---------------------------- | -| `mcp__relaycast__message_dm_send(to, text)` | Send a DM to an agent | -| `mcp__relaycast__post_message(channel, text)` | Post a message to a channel | -| `mcp__relaycast__agent_add(name, cli, task)` | Spawn a worker agent | -| `mcp__relaycast__agent_remove(name)` | Release a worker agent | -| `mcp__relaycast__agent_list()` | List connected agents | -| `mcp__relaycast__message_inbox_check()` | Check incoming messages | - -### 4.2 Broker Stdio Protocol - -The SDK communicates with the broker binary via JSON-line stdio: - -**Requests** (SDK → Broker): - -```json -{ "id": "uuid", "method": "spawn_pty", "params": { "name": "Worker", "cli": "claude" } } -``` - -**Responses** (Broker → SDK): - -```json -{ "id": "uuid", "result": { "ok": true } } -``` - -**Events** (Broker → SDK): - -```json -{ "event": "agent_idle", "data": { "name": "Worker", "idle_secs": 30 } } -``` - -### 4.3 Spawn/Release Protocol - -``` -# Spawn -KIND: spawn -NAME: WorkerName -CLI: claude - -Task description here. - -# Release -KIND: release -NAME: WorkerName -``` - -### 4.4 Message Delivery - -``` -Alice (Agent) Broker Relaycast Bob (Agent) - │ │ │ │ - │── send_dm() ─────────▶│ │ │ - │ │── WebSocket msg ──▶│ │ - │ │ │── WebSocket msg ──▶│ (Bob's broker) - │ │ │ │ - │ │ │ inject into PTY - │ │ │ "Relay message │ - │ │ │ from Alice..." │ -``` - ---- - -## 5. Message Flow - -### 5.1 Complete End-to-End Flow - -``` -┌─────────────────────────────────────────────────────────────────────────┐ -│ 1. AGENT INVOKES MCP TOOL │ -│ Agent calls: mcp__relaycast__message_dm_send(to: "Bob", text: "Can you review auth.ts?") -└─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ 2. BROKER PROCESSES TOOL CALL │ -│ Broker receives MCP tool invocation │ -│ Deduplication check (hash-based) │ -└─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ 4. RELAYCAST ROUTING │ -│ Broker sends message via WebSocket to Relaycast cloud │ -│ Relaycast routes to Bob's workspace/channel │ -└─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ 5. BOB'S BROKER RECEIVES │ -│ WebSocket delivers message to Bob's broker │ -│ Message queued for injection │ -└─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ 6. IDLE DETECTION + INJECTION │ -│ Wait for idle threshold (no output from Bob's agent) │ -│ Write to PTY stdin: "Relay message from Alice [abc12345]: │ -│ Can you review auth.ts?" │ -│ Press Enter │ -└─────────────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────────────┐ -│ 7. BOB'S AGENT PROCESSES │ -│ The message appears as user input in Bob's PTY │ -│ Bob's agent processes it as a new message │ -└─────────────────────────────────────────────────────────────────────────┘ -``` - -### 5.2 Broadcast Flow - -When sending to `TO: *`: - -``` -Alice Relaycast Bob, Carol, Dave - │ │ │ - │──── message ──────────▶│ │ - │ { to: "*", ... } │ │ - │ │ │ - │ │──── deliver ──────────▶│ Bob - │ │──── deliver ──────────▶│ Carol - │ │──── deliver ──────────▶│ Dave - │ │ │ - │ │ (Alice excluded) │ -``` - ---- - -## 6. Data Storage - -### 6.1 Storage Architecture - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ StorageAdapter Interface │ -├─────────────────────────────────────────────────────────────────┤ -│ init(): Promise │ -│ saveMessage(message: StoredMessage): Promise │ -│ getMessages(query: MessageQuery): Promise │ -│ getMessageById(id: string): Promise │ -│ close(): Promise │ -└─────────────────────────────────────────────────────────────────┘ - │ - ┌───────────────┼───────────────┐ - │ │ │ - ▼ ▼ ▼ - ┌───────────┐ ┌───────────┐ ┌───────────┐ - │ JSONL │ │ Memory │ │ DLQ │ - │ Adapter │ │ Adapter │ │ Adapter │ - └───────────┘ └───────────┘ └───────────┘ -``` - -### 6.2 File Locations - -``` -.agent-relay/ -├── credentials/ # Auth tokens -├── state.json # Broker state (agents, channels) -└── pending/ # Pending deliveries -``` - ---- - -## 7. Security Model - -### 7.1 Trust Boundaries - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ TRUST BOUNDARY: Local Machine │ -│ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ User's Terminal Session │ │ -│ │ │ │ -│ │ Agents run with user's permissions │ │ -│ │ Broker authenticates via Relaycast API keys │ │ -│ │ WebSocket connection is TLS-encrypted │ │ -│ │ │ │ -│ └──────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌──────────────────────────────────────────────────────────┐ │ -│ │ Relaycast Cloud │ │ -│ │ │ │ -│ │ Workspace isolation via API keys │ │ -│ │ Agent registration and authentication │ │ -│ │ Message persistence and routing │ │ -│ │ │ │ -│ └──────────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### 7.2 Current Security Properties - -| Property | Status | Notes | -| ---------------------- | ------ | ------------------------------- | -| Workspace isolation | ✅ | Separate API keys per workspace | -| TLS encryption | ✅ | WebSocket over TLS to Relaycast | -| Agent authentication | ✅ | API key + agent registration | -| Local file permissions | ✅ | Outbox/inbox owned by user | -| Rate limiting | ⚠️ | Server-side via Relaycast | -| Message validation | ⚠️ | Basic field presence checks | - ---- - -## 8. Design Decisions & Trade-offs - -### 8.1 Why a Rust Broker Instead of Node.js Daemon? - -**Decision**: Replace the Node.js daemon with a Rust binary. - -**Rationale**: - -- Single binary distribution — no Node.js runtime required -- Lower memory footprint and faster startup -- Native PTY support via `portable-pty` -- Better concurrency model for managing multiple agents - -**Trade-offs**: - -- ❌ Requires cross-compilation for multiple platforms -- ❌ Harder to prototype new features quickly -- ✅ Zero runtime dependencies for users -- ✅ Sub-millisecond message handling -- ✅ Single binary install via curl - -### 8.2 Why PTY Instead of Tmux? - -**Decision**: Use native PTY sessions instead of tmux. - -**Rationale**: - -- Eliminates tmux as a dependency -- More direct control over agent I/O -- Works on platforms without tmux -- Better process lifecycle management - -**Trade-offs**: - -- ❌ Users cannot detach/reattach to agent sessions directly -- ✅ No dependency installation required -- ✅ Cross-platform (including Windows) -- ✅ More reliable output capture - -### 8.3 Why MCP Tools Instead of Output Parsing? - -**Decision**: Use MCP tools (`mcp__relaycast__message_dm_send()`, `mcp__relaycast__agent_add()`, etc.) instead of inline output parsing (`->relay:Target message`). - -**Rationale**: - -- Native integration with AI agent tool-calling capabilities -- Structured parameters with type safety -- No line-wrapping or ANSI code issues -- Works reliably across all MCP-compatible CLIs - -**Trade-offs**: - -- ❌ Requires MCP-compatible CLI -- ✅ No parsing ambiguity -- ✅ Supports multi-line messages naturally -- ✅ Structured parameters and return values -- ✅ Single-step invocation (no file write + trigger) - -### 8.4 Why Relaycast Cloud Instead of Local Sockets? - -**Decision**: Route messages through Relaycast cloud WebSocket service. - -**Rationale**: - -- Cross-machine agent communication -- Persistent message history -- Workspace management and agent presence -- Dashboard integration - -**Trade-offs**: - -- ❌ Requires internet connection -- ❌ Introduces cloud dependency -- ✅ Cross-machine and cross-project messaging -- ✅ Persistent history and search -- ✅ Team collaboration features - ---- - -## 9. Known Limitations - -### 9.1 Message Delivery Reliability - -| Issue | Impact | Mitigation | -| ------------------------------------- | ------ | ----------------------------------- | -| Messages can be lost if agent is busy | Medium | Idle detection, retry logic | -| WebSocket disconnection | Medium | Automatic reconnection with backoff | -| Dedup cache memory growth | Low | Cache size limits | - -### 9.2 Platform Support - -| Platform | Status | Notes | -| -------- | ---------- | ---------------------------- | -| Linux | ✅ Full | Primary development platform | -| macOS | ✅ Full | Well tested | -| Windows | ⚠️ Partial | PTY support varies | - -### 9.3 Scalability - -| Metric | Current Limit | Notes | -| ----------------- | ------------- | -------------------------------- | -| Concurrent agents | ~50 | Limited by broker resources | -| Message rate | High | Limited by Relaycast rate limits | -| Message size | ~1 MiB | Practical limit | - ---- - -## 10. Future Considerations - -### 10.1 Potential Enhancements - -**Reliability**: - -- Guaranteed delivery with acknowledgment -- Persistent local queue for offline operation -- Message ordering guarantees - -**Features**: - -- Typed message schemas -- Priority queues -- Advanced workflow patterns - -### 10.2 Architectural Evolution - -``` -Current: - Agent ──▶ MCP Tools ──▶ Broker ──▶ Relaycast WS ──▶ Agent - -The MCP tool protocol with Rust broker has proven effective for -the target use case of multi-agent coordination across any CLI tool. -``` - ---- - -## Appendix A: File Map - -``` -agent-relay/ -├── src/ -│ ├── main.rs # Broker entry point (init, pty, headless, wrap) -│ ├── lib.rs # Library exports (auth, dedup, protocol, etc.) -│ ├── spawner.rs # Agent spawning and process management -│ ├── config.rs # Configuration handling -│ ├── protocol.rs # Protocol types and envelope definitions -│ ├── snippets.rs # Agent instruction snippets and MCP config -│ ├── cli/ -│ │ ├── bootstrap.ts # CLI entry point, command registration -│ │ ├── commands/ -│ │ │ ├── core.ts # up, down, status, spawn, bridge -│ │ │ ├── agent-management.ts # Agent CRUD operations -│ │ │ ├── messaging.ts # send, read, inbox commands -│ │ │ ├── cloud.ts # Cloud link, status, agents -│ │ │ ├── monitoring.ts # Logs, health, metrics -│ │ │ ├── auth.ts # Login, logout, SSH key auth -│ │ │ ├── setup.ts # Install, setup commands -│ │ │ └── doctor.ts # Diagnostic command -│ │ └── lib/ # Shared CLI utilities -│ └── index.ts # Package exports -├── packages/ -│ ├── sdk/ # TypeScript SDK (broker client, workflows) -│ ├── acp-bridge/ # ACP protocol bridge for editors -│ ├── config/ # Configuration loading -│ ├── hooks/ # Hook system for events -│ ├── storage/ # Message persistence (JSONL) -│ ├── utils/ # Shared utilities -│ ├── telemetry/ # Usage analytics -│ ├── trajectory/ # Work trajectory tracking -│ ├── user-directory/ # Agent directory management -│ ├── memory/ # Agent memory persistence -│ └── policy/ # Policy enforcement -├── Cargo.toml # Rust dependencies -├── package.json # Node.js dependencies -├── CLAUDE.md # Agent instructions -└── ARCHITECTURE.md # This document -``` - ---- - -## Appendix B: Environment Variables - -| Variable | Default | Description | -| ---------------------------- | --------------------------- | ------------------------------------------ | -| `AGENT_RELAY_DASHBOARD_PORT` | 3888 | Dashboard HTTP port | -| `RELAY_AGENT_NAME` | - | Agent name for broker registration | -| `RELAY_API_KEY` | - | Relaycast workspace API key | -| `RELAY_BASE_URL` | `https://api.relaycast.dev` | Relaycast API base URL | -| `RELAY_CHANNELS` | `general` | Comma-separated channel list | -| `AGENT_RELAY_DEBUG` | false | Enable debug logging | -| `RUST_LOG` | - | Rust log level (uses `tracing-subscriber`) | - ---- - -## Appendix C: Quick Reference - -### Starting the System - -```bash -# Start broker + dashboard -agent-relay up --dashboard - -# Spawn agents -agent-relay spawn Alice claude "Your task here" -agent-relay spawn Bob codex "Another task" -``` - -### Agent Communication (MCP Tools) - -``` -# Send a direct message -mcp__relaycast__message_dm_send(to: "Bob", text: "Please review the auth module") - -# Post to a channel -mcp__relaycast__post_message(channel: "general", text: "I've finished the database migration") -``` - -### Troubleshooting - -```bash -# Check broker status -agent-relay status - -# Run diagnostics -agent-relay doctor - -# View logs -RUST_LOG=debug agent-relay up -``` - ---- - -_Document updated for agent-relay v2.x (Rust broker architecture)_ -_Last updated: 2026_ diff --git a/BUDGET_AUDIT.md b/BUDGET_AUDIT.md deleted file mode 100644 index 908790158..000000000 --- a/BUDGET_AUDIT.md +++ /dev/null @@ -1,168 +0,0 @@ -# Token Budget Tracking — Audit Report - -## 1. Token Collection: Exact File Locations - -### Collection Entry Point - -- **`packages/sdk/src/workflows/cli-session-collector.ts:51-58`** — `collectCliSession()` dispatches to CLI-specific collectors based on `AgentCli` type -- **`packages/sdk/src/workflows/cli-session-collector.ts:38-49`** — `createCollector()` factory: supports `claude`, `codex`, `opencode`; returns `null` for all other CLIs - -### CLI-Specific Collectors - -| Collector | File | Token Extraction | -| ----------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Claude Code | `collectors/claude.ts:87-186` | Parses `~/.claude/projects//.jsonl`; sums `usage.input_tokens`, `usage.output_tokens`, `cache_read_input_tokens` from each `assistant` entry (line 123-128) | -| Codex | `collectors/codex.ts:149-169` | Reads `~/.codex/state_5.sqlite` `threads` table; extracts `input_tokens`, `output_tokens`, `cache_read_tokens` columns (or falls back to `tokens_used`) | -| OpenCode | `collectors/opencode.ts:222-231` | Reads `~/.local/share/opencode/opencode.db` `message` table; sums `tokens.input`, `tokens.output`, `tokens.cache.read` from JSON `data` column | - -### CliSessionReport Shape (`cli-session-collector.ts:6-24`) - -```typescript -interface CliSessionReport { - cli: AgentCli; - tokens: { input: number; output: number; cacheRead: number } | null; - cost: number | null; // Only OpenCode populates this - durationMs: number | null; - model: string | null; - turns: number; - errors: { turn: number; text: string }[]; - finalStatus: 'completed' | 'failed' | 'unknown'; - // ... -} -``` - -## 2. Token Data Flow - -``` -CLI session files (JSONL / SQLite) - │ - ▼ -collectCliSession() (cli-session-collector.ts:51) - │ - ▼ -captureAgentReport() (runner.ts:6623-6650) - ├─ this.agentReports.set(stepName) (runner.ts:6642) — in-memory Map - ├─ this.emit('step:agent-report') (runner.ts:6643) — event for listeners - └─ persistAgentReport() (runner.ts:7135-7143) — writes .report.json - │ - ▼ -formatRunSummaryTable() (run-summary-table.ts:41-110) - reads from agentReports Map (runner.ts:6833) - displays: Step | Status | Model | Cost | Tokens | Duration | Errors -``` - -### Key Details - -- **`agentReports`** is declared as `private readonly agentReports = new Map()` at **runner.ts:482** -- Cleared at workflow start: **runner.ts:2860** (`this.agentReports.clear()`) -- Populated post-execution per step: **runner.ts:6634-6642** -- Displayed in final summary: **runner.ts:6833** -- Token formatting in table: `run-summary-table.ts:8-12` sums `input + output + cacheRead` - -## 3. Where maxTokens Is Currently Referenced (NO Enforcement) - -| Location | Line | Usage | -| ---------------- | -------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -| `types.ts:206` | `AgentConstraints.maxTokens?: number` | **Field definition only** | -| `runner.ts:1663` | `agentDef.constraints?.maxTokens ?? proxyConfig.defaultBudget` | Used as `budget` in **credential proxy JWT** — passed to `mintProxyToken()` for the proxy's own rate limiting. **NOT enforced by the runner itself.** | -| `runner.ts:3995` | `specialistDef.constraints?.maxTokens` | Passed as `defaultMaxTokens` to API-mode executor config. **NOT enforced during execution.** | - -**Finding: The runner reads `maxTokens` but never checks actual token consumption against the budget. There is zero enforcement at the workflow/runner level.** - -## 4. Timeout Enforcement Pattern to Follow - -The timeout enforcement in `waitForExitWithIdleNudging()` (**runner.ts:6338-6470**) provides the exact structural pattern for token budget enforcement: - -### Timeout Pattern Structure - -``` -1. CONFIGURATION: timeoutMs from step.timeoutMs or swarm.timeoutMs -2. LOOP: while (true) { ... } -3. TRACKING: elapsed = Date.now() - startTime -4. CHECK: remaining = timeoutMs - elapsed; if (remaining <= 0) return 'timeout' -5. WAIT: exitResult = await agent.waitForExit(waitMs) -6. GRACE: On timeout, check verification before hard-failing (runner.ts:6169-6196) -7. ESCALATION: Nudge → escalate → force-release progression -``` - -### Proposed Token Budget Enforcement (Same Structure) - -``` -1. CONFIGURATION: maxTokens from step agent's constraints.maxTokens -2. LOOP: Poll token consumption periodically during execution -3. TRACKING: currentTokens = read from agentReports or live polling -4. CHECK: if (currentTokens >= maxTokens) → trigger budget exceeded -5. WAIT: Continue waiting for exit with budget check interval -6. GRACE: On budget exceeded, allow current turn to complete -7. ESCALATION: Warn at 80% → soft-stop at 100% → force-release at 110% -``` - -### Enforcement Hook Points - -| Phase | Location | Action | -| -------------------- | ------------------------------------------------------------ | -------------------------------------------------------------- | -| **Pre-spawn** | `executeAgentStep()` (~runner.ts:6050) | Validate maxTokens is set; calculate remaining workflow budget | -| **During execution** | Inside `waitForExitWithIdleNudging()` loop (~runner.ts:6424) | Periodically poll token usage; compare against budget | -| **Post-execution** | `captureAgentReport()` (~runner.ts:6623) | Record final token count; deduct from workflow-level budget | - -### Challenge: Live Token Polling - -The current collectors (`claude.ts`, `codex.ts`, `opencode.ts`) read session files **after** the agent exits. For mid-execution enforcement, one of these approaches is needed: - -1. **Tail the session JSONL** (Claude) or poll the SQLite DB (Codex/OpenCode) periodically during execution -2. **Use the credential proxy's own budget tracking** — the proxy already receives the budget via JWT and could reject requests when exhausted -3. **Parse PTY output** for token usage patterns (fragile, CLI-specific) - -**Recommendation**: Option 2 (credential proxy enforcement) for hard limits, with Option 1 (periodic polling) for soft warnings and reporting. - -## 5. Edge Cases - -### 5a. Concurrent Parallel Steps Sharing a Workflow Budget - -- Currently: each step's `maxTokens` is independent (per-agent constraint) -- No workflow-level `maxTokens` field exists in `WorkflowDefinition` (types.ts:467-474) -- **Gap**: If 5 parallel agents each have `maxTokens: 100_000`, the workflow could consume 500K tokens with no aggregate cap -- **Fix needed**: Add `maxTokens` to `WorkflowDefinition` or `SwarmConfig`; maintain an `AtomicBudget` counter decremented by each step's actual consumption; use `Atomics` or a mutex for thread-safe concurrent deductions - -### 5b. Retry Attempts Consuming from the Same Budget - -- Retries are configured via `step.retries` (types.ts:552) and `AgentConstraints.retries` (types.ts:208) -- Current retry logic re-spawns the agent with the same constraints -- **Gap**: Each retry gets a fresh `maxTokens` budget (via new JWT), not the remaining budget from prior attempts -- **Fix needed**: Track cumulative tokens across retries in `StepState`; deduct prior attempt's actual consumption from retry budget; fail the step if cumulative consumption exceeds `maxTokens × (retries + 1)` or a separate `maxTokensPerStep` field - -### 5c. Non-Interactive vs Interactive Agents - -- **Interactive agents** (PTY mode, `interactive: true`): Token collection works via session file parsing after exit. Mid-execution polling possible by tailing session files. -- **Non-interactive agents** (`interactive: false`): Run as child processes with stdout capture. Token collection still works post-execution (collectors read the same session files). However, non-interactive agents using `preset: 'worker'` may not write session files if they're invoked with `--print` or similar flags. -- **API-mode agents** (`cli: 'api'`): Use `executeApiStep()` (runner.ts:45) — token usage comes directly from API response `usage` field. Easiest to enforce in real-time. -- **Gap**: No unified mid-execution token query interface across all three modes - -### 5d. Steps That Fail Before Collection Happens - -- `captureAgentReport()` is called in the step lifecycle regardless of success/failure (runner.ts:6623-6650) -- But if a step crashes before the CLI writes any session data (e.g., spawn failure, immediate OOM), `collectCliSession()` returns `null` (cli-session-collector.ts:53) -- **Gap**: Tokens consumed before crash are lost — the partial consumption is not tracked -- **Fix needed**: For credential proxy mode, the proxy itself tracks per-session token consumption server-side. Query the proxy for actual consumption on step failure. For non-proxy mode, accept that crash-before-write results in underreporting. - -### 5e. Additional Edge Case: Token Counts Available Only After Exit - -- Claude collector reads `~/.claude/projects/.../.jsonl` which is written incrementally — can be tailed -- Codex collector reads `~/.codex/state_5.sqlite` — SQLite is updated during execution, can be polled -- OpenCode collector reads `~/.local/share/opencode/opencode.db` — same as Codex -- **All three can theoretically be polled mid-execution**, but the current `CliSessionCollector` interface (`collect()`) is designed for post-execution one-shot reads, not streaming - -## Summary - -| Component | Status | Location | -| --------------------------------- | ------------------- | ------------------------------------------- | -| Token collection (post-execution) | IMPLEMENTED | cli-session-collector.ts + collectors/\*.ts | -| Token storage in memory | IMPLEMENTED | runner.ts:482 (agentReports Map) | -| Token persistence to disk | IMPLEMENTED | runner.ts:7135-7143 (\*.report.json) | -| Token display in summary | IMPLEMENTED | run-summary-table.ts | -| maxTokens field in types | DEFINED | types.ts:206 | -| maxTokens passed to proxy JWT | IMPLEMENTED | runner.ts:1663-1688 | -| maxTokens enforcement in runner | **NOT IMPLEMENTED** | — | -| Mid-execution token polling | **NOT IMPLEMENTED** | — | -| Workflow-level aggregate budget | **NOT IMPLEMENTED** | — | -| Cross-retry budget tracking | **NOT IMPLEMENTED** | — | diff --git a/CUSTOM_VERIFY_DESIGN.md b/CUSTOM_VERIFY_DESIGN.md deleted file mode 100644 index a7787aebf..000000000 --- a/CUSTOM_VERIFY_DESIGN.md +++ /dev/null @@ -1,169 +0,0 @@ -# Custom Verification Design - -## Overview - -The `custom` verification type allows workflow authors to run arbitrary shell commands -(or regex patterns) as verification gates after an agent step completes. This replaces -the need for separate deterministic steps to validate agent output. - -## Current Implementation Status - -Custom verification is **already implemented** in the codebase: - -- `packages/sdk/src/workflows/verification.ts` — `checkCustom()` function (lines 191-226) -- `packages/sdk/src/workflows/types.ts` — `VerificationCheck` interface (lines 621-625) -- `packages/sdk/src/workflows/schema.json` — `VerificationCheck` JSON schema - -## How It Works - -### Shell Command Mode - -The `value` field contains a shell command. After the agent step completes, the command -is executed via `execSync`. The agent's output is available as `$STEP_OUTPUT` env var. - -```yaml -verification: - type: 'custom' - value: 'cd nango-integrations && npx nango compile' -``` - -**Behavior:** - -- Exit code 0 = verification passed -- Non-zero exit code = verification failed -- stderr is captured as the verification error message -- Configurable timeout via `CUSTOM_VERIFY_TIMEOUT_MS` env var (default: 30s) -- Max output buffer: 1MB - -### Regex Mode - -Prefix the value with `regex:` to match a pattern against the step output: - -```yaml -verification: - type: 'custom' - value: 'regex:Successfully compiled' -``` - -**Behavior:** - -- Pattern is compiled as a JavaScript `RegExp` -- Tested against the step's combined output -- Invalid regex returns a clear error message - -## Retry Integration - -When verification fails and `retries` is configured, the runner injects failure -context into the retry prompt (runner.ts, lines 4195-4202): - -``` -[RETRY - Attempt 2/3] -Previous attempt failed: Verification failed for "step-name": custom check failed - -Previous output (last 2000 chars): - ---- - -``` - -This gives the agent diagnostic context from the failed verification command, -enabling it to fix the issue on retry. - -## Type Definition - -```typescript -// packages/sdk/src/workflows/types.ts -export interface VerificationCheck { - type: 'output_contains' | 'exit_code' | 'file_exists' | 'custom'; - value: string; - description?: string; -} -``` - -## Implementation Details - -### `checkCustom(value, output, cwd)` — verification.ts - -```typescript -function checkCustom(value, output, cwd): { passed: boolean; stdout?: string; error?: string }; -``` - -1. **Regex branch** (`value.startsWith('regex:')`) - - Strips prefix, compiles RegExp, tests against output - - Returns `{ passed: false, error }` on mismatch or invalid regex - -2. **Shell command branch** (default) - - Runs `execSync(value, { cwd, env: { ...process.env, STEP_OUTPUT: output } })` - - Timeout: `CUSTOM_VERIFY_TIMEOUT_MS` (default 30000) - - stdio: pipe (captures stdout + stderr) - - On success: `{ passed: true, stdout }` - - On failure: `{ passed: false, error: stderr || error.message }` - -### Side Effects on Failure - -When custom verification fails, `runVerification()` records: - -- A `verification_observed` tool side effect with `passed: false` -- A `verification_failed` coordination signal in the step's evidence record -- If `allowFailure` is false (default), throws `WorkflowCompletionError` - -### Side Effects on Success - -- A `verification_observed` tool side effect with `passed: true` -- A `verification_passed` coordination signal -- Returns `{ passed: true, completionReason: 'completed_verified' }` - -## Callback Variant (Future / Programmatic Use) - -For embedding the runner in another system where the host provides verification -logic programmatically, a callback variant is reserved: - -```typescript -// Proposed extension to VerificationCheck: -interface VerificationCheck { - type: 'output_contains' | 'exit_code' | 'file_exists' | 'custom'; - value: string; - description?: string; - /** Optional async callback for programmatic verification. - * When provided with type: 'custom', the callback is invoked instead of - * running the value as a shell command. */ - callback?: (output: string) => Promise | boolean; -} -``` - -**Behavior:** - -- If `callback` is present and `type === 'custom'`, invoke the callback -- The callback receives the step's combined output -- Return `true` = passed, `false` = failed -- The `value` field serves as a human-readable label in this mode -- Falls back to shell command execution if no callback is provided - -**Note:** This callback field cannot be expressed in YAML — it's only available -when using the runner programmatically via the SDK. The JSON schema does not -include it; it lives only in the TypeScript type. - -## Backwards Compatibility - -- Existing workflows using `{ type: 'custom', value: '' }` work unchanged -- The `value` field is always required (enforced by schema) -- Empty `value` with no callback will execute an empty command, which typically - succeeds (exit 0) — authors should always provide a meaningful command - -## Example Workflow - -```yaml -workflows: - - name: build-and-verify - steps: - - name: implement-feature - agent: coder - task: 'Implement the new API endpoint' - verification: - type: 'custom' - value: 'cd nango-integrations && npx nango compile' - description: 'Ensure Nango integration compiles' - retries: 2 -``` - -On failure, the coder agent receives the compile errors in its retry prompt -and can fix the issues without a separate verification step. diff --git a/DESIGN.md b/DESIGN.md deleted file mode 100644 index c41a3ccad..000000000 --- a/DESIGN.md +++ /dev/null @@ -1,645 +0,0 @@ -# Credential Proxy — Design Document - -## Problem - -Nango runs AI agents in sandboxes that need LLM API access (OpenRouter, Anthropic, OpenAI). Today, raw API keys are passed as environment variables — agents can exfiltrate them. LiteLLM was rejected (heavy Python server). We need a lightweight, transparent proxy that: - -- Hides real API keys from sandbox agents -- Validates short-lived JWTs instead of long-lived secrets -- Forwards LLM requests unchanged (agents don't know they're proxied) -- Meters token usage per workspace/session - ---- - -## Package Location - -``` -packages/credential-proxy/ -├── package.json -├── tsconfig.json -├── src/ -│ ├── index.ts # Hono app factory + exports -│ ├── router.ts # Route definitions (/v1/chat/completions, /v1/messages) -│ ├── jwt.ts # JWT validation, claims extraction -│ ├── credential-store.ts # Interface to relay's encrypted credential storage -│ ├── metering.ts # Token usage extraction and recording -│ ├── providers/ -│ │ ├── types.ts # ProviderAdapter interface -│ │ ├── openai.ts # OpenAI adapter -│ │ ├── anthropic.ts # Anthropic adapter -│ │ └── openrouter.ts # OpenRouter adapter -│ └── errors.ts # Error types and HTTP error responses -├── test/ -│ ├── jwt.test.ts -│ ├── router.test.ts -│ ├── metering.test.ts -│ └── providers/ -│ ├── openai.test.ts -│ ├── anthropic.test.ts -│ └── openrouter.test.ts -└── README.md -``` - -Follows the same structure as `packages/gateway/` — Hono replaces the raw HTTP handling, but the adapter dispatch pattern is identical. - ---- - -## JWT Claims Schema - -```typescript -export interface ProxyJwtClaims { - /** Workspace ID (e.g., "wks_abc123") */ - sub: string; - - /** Fixed audience — must be "relay-llm-proxy" */ - aud: 'relay-llm-proxy'; - - /** LLM provider this token authorizes */ - provider: 'openai' | 'anthropic' | 'openrouter'; - - /** Reference to encrypted credential in relay's credential store */ - credentialId: string; - - /** Optional max tokens for this session (input + output combined) */ - budget?: number; - - /** Issued-at (unix seconds) */ - iat: number; - - /** Expiration (unix seconds) — default 15 min TTL */ - exp: number; - - /** Unique token ID for audit trail */ - jti: string; - - /** Issuer — "relay-credential-proxy" */ - iss: string; -} -``` - -**TTL policy:** 15 minutes default. Tokens are minted by the relay cloud API when a sandbox session starts. The sandbox receives only the JWT — never the underlying API key. - -**Signing:** HMAC-SHA256, following the pattern in `packages/sdk/src/provisioner/token.ts`. The signing secret is a per-workspace key stored in relay cloud, not in the proxy itself. The proxy receives the verification secret via environment variable or runtime config. - ---- - -## Request Flow - -``` -Agent (sandbox) - │ - │ POST /v1/chat/completions (or /v1/messages) - │ Authorization: Bearer - │ - ▼ -┌─────────────────────────────┐ -│ Credential Proxy (Hono) │ -│ │ -│ 1. Extract JWT from header │ -│ 2. Validate signature+exp │ -│ 3. Check budget (if set) │ -│ 4. Resolve real API key │ -│ via credentialId │ -│ 5. Select provider adapter │ -│ 6. Forward request with │ -│ real API key │ -│ 7. Stream response back │ -│ 8. Extract token usage │ -│ 9. Record metering event │ -└─────────────────────────────┘ - │ - ▼ -Provider API (OpenAI / Anthropic / OpenRouter) -``` - ---- - -## Provider Adapter Pattern - -Mirrors `packages/gateway/src/types.ts` — each surface adapter normalizes inbound/outbound messages. Here, each provider adapter normalizes auth headers and usage extraction. - -```typescript -// src/providers/types.ts - -export interface ProviderAdapter { - /** Provider identifier */ - readonly type: 'openai' | 'anthropic' | 'openrouter'; - - /** The upstream base URL for this provider */ - readonly baseUrl: string; - - /** - * Build the outgoing request headers. - * Replaces the proxy JWT with the real API key in the - * provider-specific auth header format. - */ - buildHeaders(apiKey: string, incomingHeaders: Headers): Headers; - - /** - * Map the incoming proxy path to the upstream provider path. - * e.g., /v1/chat/completions → /v1/chat/completions (OpenAI) - * /v1/messages → /v1/messages (Anthropic) - */ - upstreamPath(proxyPath: string): string; - - /** - * Extract token usage from the provider's response body. - * Called after the full response is buffered (non-streaming) - * or after the stream ends (streaming). - */ - extractUsage(responseBody: unknown): TokenUsage | null; - - /** - * Extract token usage from a streaming chunk (SSE data). - * Returns null for non-final chunks. Returns usage from the - * final chunk that includes it (e.g., OpenAI's last chunk - * with usage field, Anthropic's message_stop event). - */ - extractStreamingUsage(chunk: string): TokenUsage | null; -} - -export interface TokenUsage { - inputTokens: number; - outputTokens: number; - totalTokens: number; - model?: string; -} -``` - -### Adapter Implementations - -**OpenAI** (`src/providers/openai.ts`): - -- Base URL: `https://api.openai.com` -- Auth header: `Authorization: Bearer ` -- Path: `/v1/chat/completions` (passthrough) -- Usage: `response.usage.prompt_tokens`, `response.usage.completion_tokens` -- Streaming: final SSE chunk contains `usage` when `stream_options.include_usage` is set; proxy injects this option - -**Anthropic** (`src/providers/anthropic.ts`): - -- Base URL: `https://api.anthropic.com` -- Auth header: `x-api-key: ` (NOT Bearer) -- Also sets: `anthropic-version: 2023-06-01` -- Path: `/v1/messages` (passthrough) -- Usage: `response.usage.input_tokens`, `response.usage.output_tokens` -- Streaming: `message_delta` event contains `usage` in the final event - -**OpenRouter** (`src/providers/openrouter.ts`): - -- Base URL: `https://openrouter.ai/api` -- Auth header: `Authorization: Bearer ` -- Path: `/v1/chat/completions` (passthrough — OpenAI-compatible) -- Usage: `response.usage.prompt_tokens`, `response.usage.completion_tokens` -- Streaming: same as OpenAI format - ---- - -## Router Design - -```typescript -// src/router.ts - -import { Hono } from 'hono'; -import type { ProxyJwtClaims } from './jwt.js'; - -const app = new Hono(); - -// Health check -app.get('/health', (c) => c.json({ status: 'ok' })); - -// OpenAI-compatible endpoint -app.post('/v1/chat/completions', jwtMiddleware, proxyHandler); - -// Anthropic-compatible endpoint -app.post('/v1/messages', jwtMiddleware, proxyHandler); -``` - -**Route → Provider mapping:** - -- `/v1/chat/completions` → uses `claims.provider` to select OpenAI or OpenRouter adapter -- `/v1/messages` → Anthropic adapter (validated against `claims.provider === "anthropic"`) - -If the route doesn't match the JWT's `provider` claim, return 400. - -**jwtMiddleware** extracts and validates the JWT, attaches claims to context: - -```typescript -async function jwtMiddleware(c: Context, next: Next) { - const token = c.req.header('Authorization')?.replace('Bearer ', ''); - if (!token) return c.json({ error: 'Missing authorization' }, 401); - - const claims = await validateJwt(token, signingSecret); - c.set('claims', claims); - await next(); -} -``` - -**proxyHandler** orchestrates the forward-and-stream: - -```typescript -async function proxyHandler(c: Context) { - const claims = c.get('claims') as ProxyJwtClaims; - const adapter = resolveAdapter(claims.provider); - const apiKey = await credentialStore.resolve(claims.credentialId); - - // Budget check - if (claims.budget) { - const used = await metering.getSessionUsage(claims.jti); - if (used >= claims.budget) { - return c.json({ error: 'Token budget exceeded' }, 429); - } - } - - // Build upstream request - const headers = adapter.buildHeaders(apiKey, c.req.raw.headers); - const upstreamUrl = `${adapter.baseUrl}${adapter.upstreamPath(c.req.path)}`; - const body = await c.req.text(); - - const isStreaming = JSON.parse(body).stream === true; - - const upstream = await fetch(upstreamUrl, { - method: 'POST', - headers, - body, - }); - - if (!upstream.ok) { - // Pass through provider errors unchanged - return new Response(upstream.body, { - status: upstream.status, - headers: upstream.headers, - }); - } - - if (isStreaming) { - return streamResponse(c, upstream, adapter, claims); - } else { - return bufferedResponse(c, upstream, adapter, claims); - } -} -``` - -### Streaming Strategy - -For streaming responses, the proxy pipes SSE chunks through unchanged, but taps each chunk to detect usage: - -```typescript -async function streamResponse(c, upstream, adapter, claims) { - const { readable, writable } = new TransformStream(); - const writer = writable.getWriter(); - const reader = upstream.body.getReader(); - const decoder = new TextDecoder(); - - let finalUsage: TokenUsage | null = null; - - // Pipe in background - (async () => { - while (true) { - const { done, value } = await reader.read(); - if (done) break; - const text = decoder.decode(value, { stream: true }); - const usage = adapter.extractStreamingUsage(text); - if (usage) finalUsage = usage; - await writer.write(value); // Pass through unchanged - } - writer.close(); - - // Record usage after stream ends - if (finalUsage) { - await metering.record(claims, finalUsage); - } - })(); - - return new Response(readable, { - headers: { - 'content-type': 'text/event-stream', - 'cache-control': 'no-cache', - connection: 'keep-alive', - }, - }); -} -``` - ---- - -## JWT Validation - -```typescript -// src/jwt.ts - -import { createHmac, timingSafeEqual } from 'node:crypto'; -import type { ProxyJwtClaims } from './providers/types.js'; - -const ALLOWED_AUDIENCES = ['relay-llm-proxy'] as const; -const CLOCK_SKEW_SECONDS = 30; - -export function validateJwt(token: string, secret: string): ProxyJwtClaims { - const parts = token.split('.'); - if (parts.length !== 3) throw new JwtError('Malformed token'); - - const [headerB64, payloadB64, signatureB64] = parts; - - // 1. Verify signature (HMAC-SHA256, timing-safe) - const unsigned = `${headerB64}.${payloadB64}`; - const expected = createHmac('sha256', secret).update(unsigned).digest('base64url'); - - if (!timingSafeEqual(Buffer.from(expected), Buffer.from(signatureB64))) { - throw new JwtError('Invalid signature'); - } - - // 2. Decode and parse - const header = JSON.parse(base64urlDecode(headerB64)); - if (header.alg !== 'HS256') throw new JwtError('Unsupported algorithm'); - - const claims = JSON.parse(base64urlDecode(payloadB64)) as ProxyJwtClaims; - - // 3. Validate standard claims - const now = Math.floor(Date.now() / 1000); - if (claims.exp < now - CLOCK_SKEW_SECONDS) { - throw new JwtError('Token expired'); - } - if (claims.aud !== 'relay-llm-proxy') { - throw new JwtError('Invalid audience'); - } - if (!['openai', 'anthropic', 'openrouter'].includes(claims.provider)) { - throw new JwtError('Invalid provider'); - } - - return claims; -} -``` - -Follows the same HMAC-SHA256 + `timingSafeEqual` pattern used in `packages/sdk/src/provisioner/token.ts` and `packages/gateway/src/adapters/slack.ts`. - ---- - -## Credential Store Integration - -The proxy resolves real API keys via `credentialId` from the JWT claims. This integrates with relay cloud's encrypted credential storage (`packages/cloud/src/`). - -```typescript -// src/credential-store.ts - -export interface CredentialStore { - /** - * Resolve a credentialId to the decrypted API key. - * The credentialId is an opaque reference stored in the JWT claims. - * The actual key is encrypted at rest in relay cloud (S3 + KMS). - */ - resolve(credentialId: string): Promise; -} -``` - -### Implementation Options - -**Option A — API call to relay cloud** (recommended for production): - -```typescript -export class CloudCredentialStore implements CredentialStore { - constructor( - private readonly apiUrl: string, - private readonly serviceToken: string - ) {} - - async resolve(credentialId: string): Promise { - const res = await fetch(`${this.apiUrl}/api/v1/credentials/${credentialId}`, { - headers: { authorization: `Bearer ${this.serviceToken}` }, - }); - if (!res.ok) throw new CredentialError(`Failed to resolve: ${res.status}`); - const { apiKey } = await res.json(); - return apiKey; - } -} -``` - -This follows the same pattern as `packages/cloud/src/auth.ts` — the proxy never holds decryption keys; cloud API decrypts via KMS and returns the plaintext key over a TLS-protected internal channel. - -**Option B — Local cache with TTL** (for performance): - -```typescript -export class CachedCredentialStore implements CredentialStore { - private cache = new Map(); - private readonly ttlMs = 5 * 60 * 1000; // 5 min cache - - constructor(private readonly inner: CredentialStore) {} - - async resolve(credentialId: string): Promise { - const cached = this.cache.get(credentialId); - if (cached && cached.expiresAt > Date.now()) return cached.key; - - const key = await this.inner.resolve(credentialId); - this.cache.set(credentialId, { key, expiresAt: Date.now() + this.ttlMs }); - return key; - } -} -``` - -The cache must be bounded (LRU or size cap) and the TTL kept short since credentials can be rotated. - ---- - -## Metering Data Model - -```typescript -// src/metering.ts - -export interface MeteringEvent { - /** Unique event ID */ - id: string; - - /** ISO 8601 timestamp */ - timestamp: string; - - /** From JWT claims */ - workspaceId: string; // claims.sub - provider: string; // claims.provider - credentialId: string; // claims.credentialId - tokenId: string; // claims.jti (for budget tracking) - - /** From provider response */ - model: string; // e.g., "gpt-4o", "claude-sonnet-4-20250514" - inputTokens: number; - outputTokens: number; - totalTokens: number; - - /** Request metadata */ - streaming: boolean; - statusCode: number; - latencyMs: number; -} -``` - -### Recording Strategy - -**Phase 1 — Append to local log** (simple, works everywhere): - -```typescript -export class MeteringRecorder { - async record(claims: ProxyJwtClaims, usage: TokenUsage, meta: RequestMeta): Promise { - const event: MeteringEvent = { - id: crypto.randomUUID(), - timestamp: new Date().toISOString(), - workspaceId: claims.sub, - provider: claims.provider, - credentialId: claims.credentialId, - tokenId: claims.jti, - model: usage.model ?? 'unknown', - inputTokens: usage.inputTokens, - outputTokens: usage.outputTokens, - totalTokens: usage.totalTokens, - streaming: meta.streaming, - statusCode: meta.statusCode, - latencyMs: meta.latencyMs, - }; - // Emit to configured sink (stdout JSON line, or POST to metering API) - this.sink.emit(event); - } - - async getSessionUsage(tokenId: string): Promise { - // Sum totalTokens for this jti (for budget enforcement) - return this.sink.sumByTokenId(tokenId); - } -} -``` - -**Phase 2 — Push to relay cloud metering API** (for billing): - -- Batch events and flush every N seconds or N events -- POST to `/api/v1/metering/events` -- Cloud aggregates per workspace for billing - -**Metering sinks** (pluggable): - -- `StdoutSink` — JSON lines to stdout (Lambda CloudWatch / local dev) -- `ApiSink` — POST to relay cloud metering endpoint -- `InMemorySink` — for tests and budget enforcement in single-process mode - ---- - -## Error Handling - -| Error Condition | HTTP Status | Response Body | -| ---------------------------------- | ------------ | --------------------------------------------- | -| Missing Authorization header | 401 | `{ "error": "Missing authorization" }` | -| Malformed JWT | 401 | `{ "error": "Malformed token" }` | -| Invalid JWT signature | 401 | `{ "error": "Invalid signature" }` | -| Expired JWT | 401 | `{ "error": "Token expired" }` | -| Wrong audience claim | 401 | `{ "error": "Invalid audience" }` | -| Provider mismatch (route vs claim) | 400 | `{ "error": "Provider mismatch" }` | -| Credential not found | 502 | `{ "error": "Credential resolution failed" }` | -| Budget exceeded | 429 | `{ "error": "Token budget exceeded" }` | -| Provider returns error | pass-through | Provider's original error response | -| Provider unreachable | 502 | `{ "error": "Upstream unreachable" }` | -| Provider rate limit (429) | 429 | Provider's original 429 response | - -**Design principle:** Provider errors are passed through unchanged. The agent SDK already handles OpenAI/Anthropic error formats — the proxy should not transform them. Only proxy-level errors (JWT, budget, credential resolution) use the proxy's own error format. - -```typescript -// src/errors.ts - -export class ProxyError extends Error { - constructor( - message: string, - public readonly status: number, - public readonly code: string - ) { - super(message); - } -} - -export class JwtError extends ProxyError { - constructor(message: string) { - super(message, 401, 'jwt_error'); - } -} - -export class CredentialError extends ProxyError { - constructor(message: string) { - super(message, 502, 'credential_error'); - } -} - -export class BudgetExceededError extends ProxyError { - constructor() { - super('Token budget exceeded', 429, 'budget_exceeded'); - } -} -``` - -Hono error handler catches `ProxyError` and returns structured JSON: - -```typescript -app.onError((err, c) => { - if (err instanceof ProxyError) { - return c.json({ error: err.message, code: err.code }, err.status); - } - console.error('Unexpected error:', err); - return c.json({ error: 'Internal server error' }, 500); -}); -``` - ---- - -## Deployment Targets - -Hono runs on all of these with zero code changes: - -| Target | Entry Point | Notes | -| ---------------------- | ------------------------- | ------------------------- | -| **Node.js** | `hono/node-server` | Local dev, Docker, EC2 | -| **AWS Lambda** | `hono/aws-lambda` | Nango's likely deployment | -| **Cloudflare Workers** | `hono/cloudflare-workers` | Edge deployment | - -The `src/index.ts` exports the Hono app; the deployment adapter wraps it: - -```typescript -// src/index.ts -export { createProxy } from './router.js'; - -// For Node.js standalone: -// import { serve } from '@hono/node-server'; -// import { createProxy } from './index.js'; -// serve({ fetch: createProxy({ ... }).fetch, port: 3001 }); -``` - ---- - -## Security Considerations - -1. **No key exposure** — API keys never leave the proxy process. They are fetched from the credential store, used in the upstream request, and discarded. Never logged. - -2. **Short-lived tokens** — 15 min default TTL. Even if a JWT leaks, the blast radius is time-bounded and budget-capped. - -3. **Budget enforcement** — Optional per-session token budget prevents runaway costs from compromised or buggy agents. - -4. **Timing-safe comparison** — JWT signature validation uses `timingSafeEqual` to prevent timing attacks (same pattern as gateway's Slack signature verification). - -5. **No credential caching without TTL** — If caching is enabled, it's bounded and short-lived. Credential rotation takes effect within the cache TTL. - -6. **Provider error passthrough** — The proxy doesn't leak internal state in error messages. Provider errors are forwarded as-is; proxy errors use minimal, fixed messages. - -7. **Audit trail** — Every request is metered with workspace, provider, model, and token ID. Combined with JWT `jti`, this enables per-session forensics. - ---- - -## Integration with Existing Packages - -| Package | Integration | -| ------------------ | ----------------------------------------------------------------------------------------------------------- | -| `packages/sdk` | JWT minting functions extended to mint proxy tokens; `TokenClaims` type extended with proxy-specific fields | -| `packages/cloud` | Credential store API serves decrypted keys to the proxy; new `/api/v1/credentials/:id` endpoint | -| `packages/gateway` | No direct integration; shared adapter pattern for consistency | -| `packages/config` | Proxy configuration (signing secret, credential store URL) follows existing config patterns | - ---- - -## Open Questions - -1. **Multi-region credential store** — Should the proxy cache credentials regionally, or always call the central credential store? Latency vs. consistency tradeoff. - -2. **Token renewal** — Should the proxy support a `/v1/token/refresh` endpoint, or should the orchestrator (Nango) mint new tokens directly from relay cloud? - -3. **Model allowlisting** — Should the JWT claims include an allowed model list, or is provider-level access sufficient? - -4. **Request body inspection** — Should the proxy inspect/modify request bodies (e.g., inject `stream_options.include_usage` for OpenAI), or keep the body strictly opaque? diff --git a/PROXY_INTEGRATION.md b/PROXY_INTEGRATION.md deleted file mode 100644 index 5ac5345cc..000000000 --- a/PROXY_INTEGRATION.md +++ /dev/null @@ -1,394 +0,0 @@ -# Credential Proxy Integration Plan - -## Overview - -Integrate the credential proxy into the workflow runner so that agents receive proxy JWTs instead of raw API keys. When `credentials.proxy: true` is set on an agent, the runner mints a scoped JWT and injects proxy env vars — the agent never sees the real API key. - ---- - -## 1. New Config Fields - -### `agents[].credentials` (AgentDefinition) - -Add an optional `credentials` block to `AgentDefinition` in `packages/sdk/src/workflows/types.ts`: - -```typescript -export interface AgentCredentials { - /** Opt-in to credential proxy mode. When true, the runner mints a proxy JWT - * and injects RELAY_LLM_PROXY_URL + RELAY_LLM_PROXY_TOKEN instead of raw keys. */ - proxy?: boolean; - /** Override the default budget (max tokens) for this agent's proxy session. */ - budget?: number; - /** Override which providers this agent can access (defaults to all configured). */ - providers?: ProviderType[]; -} - -export interface AgentDefinition { - // ... existing fields ... - credentials?: AgentCredentials; -} -``` - -### `swarm.credentialProxy` (SwarmConfig) - -Add an optional `credentialProxy` block to `SwarmConfig`: - -```typescript -export interface CredentialProxyConfig { - /** The proxy endpoint URL (e.g. "https://agentrelay.com/llm-proxy"). */ - proxyUrl: string; - /** JWT signing secret. Supports env var reference: "$RELAY_PROXY_SECRET". */ - jwtSecret: string; - /** Default max-token budget per agent session. */ - defaultBudget?: number; - /** Provider-to-credential mapping. */ - providers: Partial>; -} - -export interface SwarmConfig { - // ... existing fields ... - credentialProxy?: CredentialProxyConfig; -} -``` - ---- - -## 2. Runner Modifications - -All changes in `packages/sdk/src/workflows/runner.ts`. - -### 2a. Import credential-proxy JWT minting - -```typescript -import { mintProxyToken, type ProxyTokenClaims } from '@agent-relay/credential-proxy/jwt'; -``` - -### 2b. New instance state - -```typescript -/** Minted proxy tokens keyed by agent definition name. */ -private proxyTokens = new Map(); -``` - -### 2c. Mint tokens in `provisionAgents()` (~line 1547) - -After the existing provisioning loop, add proxy token minting: - -```typescript -// ── Credential proxy provisioning ────────────────────────────────── -const proxyConfig = config.swarm.credentialProxy; -if (proxyConfig) { - for (const agent of config.agents) { - if (!agent.credentials?.proxy) continue; - - const providers = agent.credentials.providers ?? (Object.keys(proxyConfig.providers) as ProviderType[]); - - // Mint one JWT per provider per agent - // For simplicity, mint for the first configured provider. - // Multi-provider support: mint multiple tokens or a multi-provider token. - for (const provider of providers) { - const providerConfig = proxyConfig.providers[provider]; - if (!providerConfig) continue; - - const claims: ProxyTokenClaims = { - sub: `${this.workspaceId}:${agent.name}`, - aud: 'relay-llm-proxy', - provider, - credentialId: providerConfig.credentialId, - budget: agent.credentials.budget ?? proxyConfig.defaultBudget, - }; - - const secret = proxyConfig.jwtSecret.startsWith('$') - ? (process.env[proxyConfig.jwtSecret.slice(1)] ?? proxyConfig.jwtSecret) - : proxyConfig.jwtSecret; - - const token = await mintProxyToken(claims, secret); - // Key: "agentName:provider" for multi-provider, or just agentName for single - this.proxyTokens.set(`${agent.name}:${provider}`, token); - } - } -} -``` - -### 2d. Modify `getRelayEnv()` (~line 1535) - -No changes needed here — proxy env vars are injected at the spawn site (2e/2f) rather than globally, because only proxy-enabled agents should receive them. - -### 2e. Modify `execNonInteractive()` (~line 5572) - -After the existing `agentToken`/`mount` injection block, add proxy env injection: - -```typescript -// ── Credential proxy env injection ───────────────────────────────── -const proxyConfig = this.currentConfig?.swarm?.credentialProxy; -if (proxyConfig && agentDef.credentials?.proxy) { - const cliOverrides = resolveCliBaseUrlOverrides(agentDef.cli, proxyConfig.proxyUrl); - Object.assign(env, cliOverrides); - - // Inject proxy token(s) — find all tokens for this agent - for (const [key, token] of this.proxyTokens) { - if (key.startsWith(`${agentDef.name}:`)) { - const provider = key.split(':')[1]; - env[`RELAY_LLM_PROXY_TOKEN_${provider.toUpperCase()}`] = token; - } - } - env.RELAY_LLM_PROXY_URL = proxyConfig.proxyUrl; - - // Strip raw API keys so the agent can't bypass the proxy - delete env.OPENAI_API_KEY; - delete env.ANTHROPIC_API_KEY; - delete env.OPENROUTER_API_KEY; -} -``` - -### 2f. Modify `spawnAndWait()` (~line 5831) - -In the `spawnOptions` construction, pass proxy env via the spawn options: - -```typescript -const spawnEnvOverrides: Record = {}; -const proxyConfig = this.currentConfig?.swarm?.credentialProxy; -if (proxyConfig && agentDef.credentials?.proxy) { - const cliOverrides = resolveCliBaseUrlOverrides(agentDef.cli, proxyConfig.proxyUrl); - Object.assign(spawnEnvOverrides, cliOverrides); - - for (const [key, token] of this.proxyTokens) { - if (key.startsWith(`${agentDef.name}:`)) { - const provider = key.split(':')[1]; - spawnEnvOverrides[`RELAY_LLM_PROXY_TOKEN_${provider.toUpperCase()}`] = token; - } - } - spawnEnvOverrides.RELAY_LLM_PROXY_URL = proxyConfig.proxyUrl; -} - -// Pass spawnEnvOverrides into spawnOptions.env (needs relay.spawnPty to accept env) -``` - -### 2g. Modify `filteredEnv()` (~line 150) - -Add a `stripApiKeys` parameter: - -```typescript -function filteredEnv( - extra?: Record, - options?: { stripApiKeys?: boolean } -): Record { - const env: Record = {}; - const stripKeys = new Set( - options?.stripApiKeys ? ['OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'OPENROUTER_API_KEY'] : [] - ); - for (const key of ENV_ALLOWLIST) { - if (stripKeys.has(key)) continue; - if (process.env[key] !== undefined) { - env[key] = process.env[key]; - } - } - if (extra) { - Object.assign(env, extra); - } - return env; -} -``` - -Note: Currently none of `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `OPENROUTER_API_KEY` are in the `ENV_ALLOWLIST` (line 113-147), so they already do NOT propagate through `filteredEnv()`. They would only leak through `getRelayEnv()` which spreads `...process.env`. The `delete` statements in 2e handle this case. - ---- - -## 3. CLI Base URL Override Registry - -New file: `packages/sdk/src/workflows/cli-proxy-overrides.ts` - -Each coding agent CLI uses different env vars to override the LLM API base URL. The proxy works by redirecting these base URLs to the proxy endpoint. - -```typescript -import type { AgentCli } from './types.js'; - -/** Maps CLI name -> env var overrides needed to redirect LLM calls through the proxy. */ -const CLI_BASE_URL_OVERRIDES: Record Record> = { - // Claude Code - claude: (url) => ({ - ANTHROPIC_BASE_URL: url, - ANTHROPIC_API_KEY: 'proxy', // Claude Code requires a non-empty key - }), - - // OpenAI Codex CLI - codex: (url) => ({ - OPENAI_BASE_URL: url, - OPENAI_API_KEY: 'proxy', - }), - - // OpenCode - opencode: (url) => ({ - OPENAI_BASE_URL: url, - OPENAI_API_KEY: 'proxy', - }), - - // Aider - aider: (url) => ({ - OPENAI_API_BASE: url, - OPENAI_API_KEY: 'proxy', - }), - - // Gemini CLI - gemini: (url) => ({ - GOOGLE_API_BASE: url, - }), - - // Goose (uses OpenAI-compatible endpoint) - goose: (url) => ({ - OPENAI_BASE_URL: url, - OPENAI_API_KEY: 'proxy', - }), - - // Droid (uses OpenAI-compatible endpoint) - droid: (url) => ({ - OPENAI_BASE_URL: url, - OPENAI_API_KEY: 'proxy', - }), - - // Cursor / Cursor Agent (uses OpenAI-compatible endpoint) - cursor: (url) => ({ - OPENAI_BASE_URL: url, - OPENAI_API_KEY: 'proxy', - }), - 'cursor-agent': (url) => ({ - OPENAI_BASE_URL: url, - OPENAI_API_KEY: 'proxy', - }), -}; - -/** Generic fallback: set both major provider base URLs. */ -const GENERIC_FALLBACK = (url: string): Record => ({ - OPENAI_BASE_URL: url, - ANTHROPIC_BASE_URL: url, - OPENAI_API_KEY: 'proxy', - ANTHROPIC_API_KEY: 'proxy', -}); - -/** - * Resolve the env var overrides needed to route a CLI's LLM calls through the proxy. - * - * @param cli - The agent CLI type (e.g. "claude", "codex", "aider") - * @param proxyUrl - The credential proxy endpoint URL - * @returns Record of env vars to inject into the agent's environment - */ -export function resolveCliBaseUrlOverrides(cli: AgentCli | string, proxyUrl: string): Record { - const resolver = CLI_BASE_URL_OVERRIDES[cli] ?? GENERIC_FALLBACK; - return resolver(proxyUrl); -} -``` - ---- - -## 4. Workflow Config Example - -```yaml -version: '1' -name: multi-agent-with-proxy -description: Agents use credential proxy instead of raw API keys - -swarm: - pattern: fan-out - credentialProxy: - proxyUrl: 'https://agentrelay.com/llm-proxy' - jwtSecret: '$RELAY_PROXY_SECRET' # resolved from env - defaultBudget: 100000 - providers: - anthropic: - credentialId: 'nango-anthropic-prod' - openai: - credentialId: 'nango-openai-prod' - openrouter: - credentialId: 'nango-openrouter-prod' - -agents: - - name: generator - cli: claude - role: 'Code generator' - credentials: - proxy: true # opt-in to proxy mode - # Agent receives ANTHROPIC_BASE_URL pointing to proxy - # and a scoped JWT — never sees the real Anthropic key - - - name: reviewer - cli: codex - role: 'Code reviewer' - credentials: - proxy: true - budget: 50000 # override default budget - # Agent receives OPENAI_BASE_URL pointing to proxy - - - name: legacy-agent - cli: aider - role: 'Legacy helper' - # No credentials.proxy — gets normal env, no proxy -``` - ---- - -## 5. Data Flow - -``` -relay.yaml Runner Agent Process -───────── ────── ───────────── -credentialProxy config ──→ provisionAgents() - │ - ├─ for each agent w/ proxy:true - │ └─ mintProxyToken(claims, secret) ──→ JWT - │ - ├─ spawnAndWait() / execNonInteractive() - │ ├─ resolveCliBaseUrlOverrides(cli, proxyUrl) - │ ├─ inject RELAY_LLM_PROXY_URL - │ ├─ inject RELAY_LLM_PROXY_TOKEN_ - │ ├─ inject CLI-specific base URL overrides - │ └─ strip raw API keys from env - │ - └─ Agent spawns with proxy env ──→ CLI makes API call - │ - ├─ Base URL → proxy - ├─ Proxy validates JWT - ├─ Proxy fetches real - │ credential from Nango - └─ Proxy forwards to - real provider API -``` - ---- - -## 6. Backwards Compatibility - -- **No `credentialProxy` in swarm config**: Zero behavior change. No proxy tokens minted. -- **No `credentials.proxy` on agent**: Zero behavior change. Agent gets normal env. -- **Mixed mode**: Some agents use proxy, others don't. Each agent's env is independent. -- **`filteredEnv()` unchanged**: Raw API keys are already excluded from the allowlist. Only `getRelayEnv()` (which spreads `process.env`) could leak them, and the proxy injection code explicitly deletes them. - ---- - -## 7. Security Considerations - -- **JWT scope**: Each token is scoped to one agent + one provider + one credential. An agent cannot use another agent's token for a different provider. -- **Budget enforcement**: The proxy validates budget claims and rejects requests that exceed the token's budget. -- **Key stripping**: Raw API keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `OPENROUTER_API_KEY`) are deleted from the agent's env when proxy mode is active, preventing bypass. -- **Secret resolution**: `jwtSecret` supports `$ENV_VAR` syntax so the secret never appears in YAML files. -- **Token TTL**: Tokens use the 15-minute default TTL from `DEFAULT_PROXY_TOKEN_TTL_SECONDS`. For long-running agents, the runner should refresh tokens (future enhancement). - ---- - -## 8. Files to Modify - -| File | Change | -| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- | -| `packages/sdk/src/workflows/types.ts` | Add `AgentCredentials`, `CredentialProxyConfig`, update `AgentDefinition` and `SwarmConfig` | -| `packages/sdk/src/workflows/runner.ts` | Import proxy JWT, add `proxyTokens` map, modify `provisionAgents()`, `execNonInteractive()`, `spawnAndWait()` | -| `packages/sdk/src/workflows/cli-proxy-overrides.ts` | **New file** — CLI base URL override registry | -| `packages/sdk/src/workflows/schema.json` | Add `credentialProxy` and `credentials` to validation schema | - ---- - -## 9. Implementation Order - -1. **Types first** — Add `AgentCredentials` and `CredentialProxyConfig` to `types.ts` -2. **CLI overrides** — Create `cli-proxy-overrides.ts` with the resolver registry -3. **Runner integration** — Wire up minting + env injection in `runner.ts` -4. **Schema update** — Add new fields to `schema.json` for YAML validation -5. **Tests** — Unit tests for `resolveCliBaseUrlOverrides()`, integration tests for env injection diff --git a/README.md b/README.md index c91d2105d..6e5c65741 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,55 @@ -![Agent Relay](./readme-banner.png) +Agent Relay +**Website:** [agentrelay.com](https://agentrelay.com) · **Docs:** [agentrelay.com/docs](https://agentrelay.com/docs) +npm +Tests +License -
+
- +## Multi Agent Orchestration -[![Featured on OSSCAR](https://osscar.dev/api/badge?slug=agentworkforce)](https://osscar.dev/org/agentworkforce) +Enable your Claude Code, Codex, OpenCode agent spawn agent teams that can communicate and collaborate. Not subagents, but real agents who +could spawn their own subagents. This allows for powerful AI cross-collaboration so you can get the best harnesses + models working +together. -Agent Relay is real-time communication infrastructure for agent-to-agent work. Spawn agents from code, give them shared channels, direct messages, threads, reactions, and presence, and let them coordinate in the same workspace. +## Benefits Over Subagents -It is not a framework or a harness. Your agents keep running however they already run. Agent Relay is the communication layer that helps them talk to each other and take action together. +1. The agent orchestrating has full insight what the spawned agents are doing. It can read the logs and steer mid turn if needed +2. Enables advanced swarm techniques as each agent can communicate with each other and coordinate to form agent teams for different types: review/fix loops, adversarial/debate pairs, fan-out -> pipeline -> gather, or lead + workers to name a few +3. Diversity of thought and implementation. Codex implement, Claude review, Gemini do the final verification leads to better results as different models + harnesses excel in different things. +4. Review happens as a conversation between the live reviewer and the live implementer, not as a report handed back to the parent after each one finishes. +5. Audit trail exists outside the agent and outside the parent. With the [Agent Relay Observer](https://agentrelay.com/observer) you get full auditability into every single DM and group message sent by the agents. -**Website:** [agentrelay.com](https://agentrelay.com) · **Docs:** [agentrelay.com/docs](https://agentrelay.com/docs) +## Get Started - npm - Tests - License - +1. Install the agent-relay cli + +``` +curl -fsSL https://raw.githubusercontent.com/AgentWorkforce/relay/main/install.sh | bash + +``` + +2. Install the skill + +``` +npx skills add https://github.com/agentworkforce/skills --skill orchestrating-agent-relay +``` + +3. Tell your agent to use it + +``` +use the orchestrating-agent-relay skill to spawn a claude and codex agent and [YOUR_TASK] +``` +For single, well-scoped, one-shot tasks, subagents still win. Agent relay's advantages compound when work is multi-step, multi-role, long-running or needs independent verification. -## Why Agent Relay +## SDK -- **Built for real-time coordination**: channels, messages, inboxes, reactions, and presence for agents that need to collaborate. -- **Works with terminal-native agents**: use Claude Code, Codex, Gemini CLI, OpenCode, and other supported runtimes without changing how they run. -- **SDK-first**: spawn agents programmatically, route work, wait for readiness, and manage lifecycles from TypeScript or Python. -- **Useful from both code and tools**: wire Relay into apps, scripts, plugins, and local workflows. +Use the Agent Relay SDK to spawn and control agents programmatically. -## Install +### Install **TypeScript / Node.js** @@ -45,7 +67,7 @@ pip install agent-relay-sdk See the [Python SDK](./packages/sdk-py) for Python usage and adapters. -## Quick example +### Quick example ```typescript import { AgentRelay, Models } from '@agent-relay/sdk'; @@ -82,53 +104,17 @@ await relay.shutdown(); Want more than a toy example? Start with: -- [Introduction](./docs/introduction.md) -- [CLI on the Relay](./docs/cli-on-the-relay.md) -- [Examples](./examples/README.md) -- [TypeScript SDK README](./packages/sdk/README.md) -- [Python SDK README](./packages/sdk-py/README.md) +- [Introduction](https://agentrelay.com/docs/introduction) +- [TypeScript SDK README](https://agentrelay.com/docs/typescript-sdk) +- [Python SDK README](https://agentrelay.com/docs/python-sdk) -## What you can build +### What you can build - Multi-agent coding flows with shared channels and worker handoffs - Agent inboxes for status updates, blockers, and review loops - Tooling that lets existing agents communicate without rewriting their runtime - Local or remote coordination patterns where multiple agents need shared context -## Claude Code plugin - -Use Agent Relay directly inside Claude Code, no SDK required. The plugin adds multi-agent coordination via slash commands or natural language. - -```text -/plugin marketplace add Agentworkforce/skills -/plugin install claude-relay-plugin -``` - -Once installed, you can coordinate teams of agents with built-in skills: - -```text -> /relay-team Refactor the auth module, split the middleware, update tests, and update docs -> /relay-fanout Run linting fixes across all packages in the monorepo -> /relay-pipeline Analyze the API logs, generate a summary report, then draft an email -``` - -Or just describe what you want in plain language: - -```text -> Use relay fan-out to lint all packages in parallel -> Split the migration into three relay workers, one for the schema, one for the API, and one for the frontend -``` - -See [docs/plugin-claude-code.md](./docs/plugin-claude-code.md) and the [plugin README](https://github.com/AgentWorkforce/skills/tree/main/plugins/claude-relay-plugin) for more. - -## Agent Relay CLI - -Install the CLI with: - -```bash -curl -fsSL https://raw.githubusercontent.com/AgentWorkforce/relay/main/install.sh | bash -``` - Then use Agent Relay to bring agents into a shared workspace and route work between them. ## Supported agents and runtimes @@ -142,7 +128,7 @@ Agent Relay is designed for terminal-native agents and SDK-driven workflows. Thi The broader SDK and workflow surface also includes additional integrations in the codebase. See the package docs for details. -## Development +### Development If you want to work on the repo itself: @@ -154,7 +140,6 @@ npm test Useful references: -- [ARCHITECTURE.md](./ARCHITECTURE.md) - [CHANGELOG.md](./CHANGELOG.md) - [GitHub Issues](https://github.com/AgentWorkforce/relay/issues) diff --git a/TRACEBACK_DESIGN.md b/TRACEBACK_DESIGN.md deleted file mode 100644 index 19294a79b..000000000 --- a/TRACEBACK_DESIGN.md +++ /dev/null @@ -1,422 +0,0 @@ -# Verification Traceback Pattern — Design Document - -## Problem - -When a verification check fails and the runner retries a step, the retry prompt currently includes: - -1. The raw error message -2. The last 2000 characters of the previous agent's output -3. For custom verification: the command and its output - -This is a blunt instrument. The failing agent receives a wall of text and must self-diagnose what went wrong. For complex verification failures (e.g., `npx nango compile` producing 50 lines of TypeScript errors), the agent often wastes its retry attempt misinterpreting the error or fixing the wrong file. - -**Marcin's insight**: "It's a DAG, so technically no loops." The review-loop template (`builtin-templates/review-loop.yaml`) achieves review via a DAG topology — separate steps for implement, review, consolidate, address. But diagnostic traceback is fundamentally different: it must happen _within_ the retry loop, not as a separate DAG step. - -**Solution**: Spawn an ephemeral diagnostic agent inside the runner's retry flow. This agent analyzes the failure and produces targeted guidance that gets injected into the retry prompt — replacing the raw 2000-char truncation with intelligent analysis. - ---- - -## 1. New `VerificationCheck` Field: `diagnosticAgent` - -### Type Change - -```typescript -// packages/sdk/src/workflows/types.ts -export interface VerificationCheck { - type: 'output_contains' | 'exit_code' | 'file_exists' | 'custom'; - value: string; - description?: string; - timeoutMs?: number; - /** Name of an agent defined in the workflow's agents list. - * When set, and verification fails with retries remaining, - * this agent is spawned to analyze the failure before retry. */ - diagnosticAgent?: string; -} -``` - -The field is optional. When omitted, existing retry behavior is preserved exactly. - -### Schema Change - -In `schema.json`, add to the `VerificationCheck` definition: - -```json -"diagnosticAgent": { - "type": "string", - "description": "Agent name to spawn for failure diagnosis before retry" -} -``` - -### Validation - -During preflight/dry-run, if `diagnosticAgent` is set: - -- The named agent **must** exist in the workflow's `agents` list -- Warning if the step has `retries: 0` or no `retries` (diagnostic agent would never run) - ---- - -## 2. Runner Integration - -### Where It Hooks In - -The traceback logic lives in `executeAgentStep()` in `runner.ts`, specifically in the retry prompt construction block (currently lines ~4203-4219). - -Current flow: - -``` -attempt loop start - → resolve task with step output variables - → if attempt > 0: prepend [RETRY] context (raw error + last 2000 chars) - → spawn agent - → collect output - → run verification - → if verification fails: throw WorkflowCompletionError - → catch block: lastError = error, continue loop -attempt loop end -``` - -New flow: - -``` -attempt loop start - → resolve task with step output variables - → if attempt > 0: prepend [RETRY] context (see below) - → spawn agent - → collect output - → run verification - → if verification fails AND diagnosticAgent is set AND retries remain: - a. spawn diagnostic agent (ephemeral, non-interactive) - b. collect diagnostic output - c. store diagnostic output for next iteration's retry prompt - → throw WorkflowCompletionError (unchanged) - → catch block: lastError = error, continue loop -attempt loop end -``` - -### Diagnostic Agent Prompt - -When verification fails and `diagnosticAgent` is configured, the runner spawns the diagnostic agent with this prompt: - -``` -The following verification failed after step "". - -Verification command: -Verification output: - - -Step task was: - - -Step output (last 2000 chars): - - -Analyze what went wrong. Your response will be injected into the retry prompt -for the original agent. Be specific about: -- Which file(s) have issues -- What the exact error is (line numbers, error codes) -- What the agent should do differently on the next attempt - -Do NOT fix the code yourself — just diagnose. -``` - -### Modified Retry Prompt - -When diagnostic output is available, the retry prompt changes from: - -``` -[RETRY — Attempt 2/3] -Previous attempt failed: -[VERIFICATION FAILED] Your code did not pass the verification check. -Command: npx nango compile -Output: - - -Fix the issues above before proceeding. -Previous output (last 2000 chars): - ---- - -``` - -To: - -``` -[RETRY — Attempt 2/3] -Verification failed. A diagnostic agent analyzed the failure: - ---- Diagnostic Analysis --- - ---- End Analysis --- - -Original verification error: -Command: npx nango compile -Output (last 500 chars): - - ---- - -``` - -The raw verification output is kept but truncated more aggressively (500 chars instead of 2000) since the diagnostic analysis is the primary guidance. - -### Implementation Location - -New private method on `WorkflowRunner`: - -```typescript -private async runDiagnosticAgent( - step: WorkflowStep, - verificationError: string, - agentOutput: string, - originalTask: string, - agentMap: Map, - timeoutMs?: number -): Promise -``` - -Returns the diagnostic output, or `null` if: - -- The diagnostic agent is not configured -- The diagnostic agent timed out -- The diagnostic agent failed to spawn - -New instance field to store diagnostic output between retry iterations: - -```typescript -private lastDiagnosticOutput = new Map(); -``` - ---- - -## 3. Builder API - -### Step Configuration - -```typescript -const workflow = new WorkflowBuilder('nango-sync') - .agent('generator', { cli: 'claude', role: 'Code generator' }) - .agent('reviewer', { cli: 'claude', role: 'Diagnostic reviewer', interactive: false }) - .step('generate', { - agent: 'generator', - task: 'Implement the Nango sync integration for ...', - verification: { - type: 'custom', - value: 'cd nango-integrations && npx nango compile', - diagnosticAgent: 'reviewer', - }, - retries: 2, - }) - .build(); -``` - -### YAML Configuration - -```yaml -agents: - - name: generator - cli: claude - role: Code generator - - - name: reviewer - cli: claude - role: Diagnostic reviewer - interactive: false - constraints: - maxTokens: 4000 - timeoutMs: 60000 - -workflows: - - name: nango-sync - steps: - - name: generate - agent: generator - task: | - Implement the Nango sync integration for ... - verification: - type: custom - value: cd nango-integrations && npx nango compile - diagnosticAgent: reviewer - retries: 2 -``` - ---- - -## 4. Diagnostic Agent Lifecycle - -### Ephemeral Spawning - -The diagnostic agent: - -- Is defined in the workflow's `agents` list (same as any other agent) -- Uses the same agent definition (CLI, model, permissions, cwd) -- Is spawned **ephemerally** by the runner — it does NOT appear as a step in the DAG -- Does NOT get registered with relay messaging (no PTY, no channel) -- Runs as `interactive: false` regardless of the agent definition's setting -- Is spawned via the same `executor.executeAgentStep()` path used for non-interactive workers - -### Not a DAG Step - -The diagnostic agent invocation: - -- Has no `WorkflowStepRow` in the database -- Has no entry in `stepStates` -- Does not appear in dry-run reports -- Does not participate in barriers or coordination -- Is invisible to the DAG topology - -It is an implementation detail of the retry mechanism, similar to how the runner already injects retry context strings. - -### Evidence Recording - -The diagnostic invocation IS recorded in the step's completion evidence: - -```typescript -this.recordStepToolSideEffect(step.name, { - type: 'diagnostic_agent', - detail: `Diagnostic agent "${diagnosticAgentName}" analyzed verification failure (attempt ${attempt})`, - raw: { - diagnosticAgent: diagnosticAgentName, - attempt, - outputLength: diagnosticOutput.length, - }, -}); -``` - -This requires adding `'diagnostic_agent'` to the `CompletionEvidenceToolSideEffectType` union. - ---- - -## 5. Timeout Handling - -### Sub-Timeout - -The diagnostic agent runs with a dedicated sub-timeout: - -| Source | Timeout | -| ---------------------------------------------- | -------------------------------------- | -| Diagnostic agent's own `constraints.timeoutMs` | Used if set | -| Default | 60,000 ms (60 seconds) | -| Step's remaining time | Capped to avoid exceeding step timeout | - -```typescript -const diagnosticTimeout = Math.min( - diagnosticAgentDef.constraints?.timeoutMs ?? 60_000, - remainingStepTimeMs ?? Infinity -); -``` - -### Fallback on Timeout - -If the diagnostic agent times out or errors: - -1. Log a warning: `[step-name] Diagnostic agent timed out, falling back to raw retry` -2. Fall back to the existing retry behavior (raw error + 2000 chars) -3. The retry still happens — diagnostic failure does NOT consume a retry attempt - ---- - -## 6. Budget Interaction - -### Token Accounting - -When budget enforcement is enabled (`swarm.tokenBudget`): - -- Diagnostic agent token usage counts toward the **workflow's total budget** -- Diagnostic token usage is attributed to the step being retried -- If the workflow budget is exhausted, the diagnostic agent is NOT spawned (fall back to raw retry) - -### Budget Check Before Spawning - -```typescript -if (this.budgetTracker && !this.budgetTracker.canSpend(estimatedDiagnosticTokens)) { - this.log(`[${step.name}] Skipping diagnostic agent — budget exhausted`); - return null; // fall back to raw retry -} -``` - -The `estimatedDiagnosticTokens` is a conservative estimate (default: 2000 tokens) to avoid spawning a diagnostic agent that would immediately be killed by budget enforcement. - ---- - -## 7. How This Differs from Existing Retry - -| Aspect | Current Retry | Traceback Retry | -| -------------- | ---------------------------------- | ----------------------------------------------- | -| Error context | Raw error string | Diagnostic agent analysis | -| Output context | Last 2000 chars (blind truncation) | Agent-analyzed output (targeted) | -| Root cause | Agent must self-diagnose | Diagnostic agent identifies root cause | -| Fix guidance | None | Specific files, errors, and suggested approach | -| Cost | Free (string ops) | 1 additional agent invocation per retry | -| Latency | None | 10-60s per diagnostic invocation | -| Fallback | N/A | Falls back to current behavior on timeout/error | - -### When to Use Traceback vs Plain Retry - -- **Plain retry** (no `diagnosticAgent`): Simple verification (output_contains, file_exists), or when the error message is self-explanatory -- **Traceback**: Complex verification (compilation, test suites, linting) where the raw output needs interpretation - ---- - -## 8. Sequence Diagram - -``` -Step Attempt 1: - Runner → spawn generator agent - Generator → produces code - Runner → run verification (npx nango compile) - Verification → FAILS (compile errors) - - Runner → diagnosticAgent is set, retries remain - Runner → spawn reviewer agent (ephemeral) - Prompt: "Verification failed. Here's the error output and - the agent's work. Diagnose what went wrong." - Reviewer → "The generator created fetchUsers.ts but imported - from 'nango' instead of '@nangohq/node'. Line 12 - has a type error: UserResponse is not exported - from the schema file. The agent should fix the - import path and use the correct type name." - Runner → store diagnostic output - -Step Attempt 2: - Runner → spawn generator agent - Prompt: "[RETRY — Attempt 2/3] - Verification failed. Diagnostic analysis: - --- The generator created fetchUsers.ts but imported - from 'nango' instead of '@nangohq/node'. Line 12 ... - --- - Original task: Implement the Nango sync integration..." - Generator → fixes the specific issues identified - Runner → run verification (npx nango compile) - Verification → PASSES - Step → completed_verified -``` - ---- - -## 9. Edge Cases - -1. **Diagnostic agent is the same as the step agent**: Allowed. The diagnostic agent is a separate invocation with a diagnosis-specific prompt. - -2. **Multiple verification checks on a step**: Not currently supported (VerificationCheck is singular). If added later, diagnostic agent runs once for the first failing check. - -3. **Owner-supervised steps**: Diagnostic agent runs AFTER the owner/specialist flow but BEFORE the retry. It supplements, not replaces, the owner decision flow. - -4. **Non-custom verification with diagnosticAgent**: Supported but less useful. For `file_exists`, the diagnostic prompt would include "file X does not exist" — still potentially valuable for the diagnostic agent to suggest why. - -5. **Diagnostic agent itself fails verification**: N/A — the diagnostic agent has no verification check. Its raw output is used as-is. - ---- - -## 10. Implementation Checklist - -1. **types.ts**: Add `diagnosticAgent?: string` to `VerificationCheck` interface -2. **types.ts**: Add `'diagnostic_agent'` to `CompletionEvidenceToolSideEffectType` union -3. **schema.json**: Add `diagnosticAgent` to verification check schema -4. **runner.ts**: Add `runDiagnosticAgent()` private method -5. **runner.ts**: Add `lastDiagnosticOutput` map field -6. **runner.ts**: Modify retry prompt construction in `executeAgentStep()` to use diagnostic output when available -7. **runner.ts**: Call `runDiagnosticAgent()` when verification fails with retries remaining -8. **builder.ts**: Allow `diagnosticAgent` in step verification config (pass-through, no builder changes needed beyond type) -9. **Validation**: Add preflight check that `diagnosticAgent` references a valid agent -10. **Tests**: Unit tests for diagnostic prompt construction, timeout fallback, budget skip diff --git a/crates/broker/src/snippets.rs b/crates/broker/src/snippets.rs index 1de484f55..9baaf92a3 100644 --- a/crates/broker/src/snippets.rs +++ b/crates/broker/src/snippets.rs @@ -11,35 +11,8 @@ use tokio::process::Command; const RELAYCAST_MCP_PACKAGE: &str = "@relaycast/mcp"; -const TARGET_FILES: [&str; 3] = ["AGENTS.md", "CLAUDE.md", "GEMINI.md"]; -const MARKER_START: &str = ""; -const MARKER_END: &str = ""; -const MARKER_START_PREFIX: &str = " -# Agent Relay Protocol - -Use AGENT_RELAY_OUTBOX and ->relay-file:spawn. - -"#; - fs::write(root.join("AGENTS.md"), legacy).expect("write legacy snippet"); - fs::write(root.join("CLAUDE.md"), legacy).expect("write legacy snippet"); - fs::write(root.join("GEMINI.md"), legacy).expect("write legacy snippet"); - fs::write( - root.join(".mcp.json"), - serde_json::json!({ - "mcpServers": { - "relaycast": { - "command": "npx", - "args": [RELAYCAST_MCP_PACKAGE] - } - } - }) - .to_string(), - ) - .expect("write .mcp.json"); - - let report = install_isolated(root).expect("upgrade snippets"); - assert_eq!(report.updated, 3); - - let content = fs::read_to_string(root.join("AGENTS.md")).expect("read AGENTS.md"); - assert!(content.contains(MARKER_START)); - assert!(!content.contains("Use AGENT_RELAY_OUTBOX and ->relay-file:spawn.")); - assert!(content.contains("Use MCP/skills only; do not use filesystem protocols.")); - } - #[test] fn creates_reaycast_mcp_config_when_missing() { let temp = tempdir().expect("tempdir"); diff --git a/docker-compose.browser.yml b/docker-compose.browser.yml deleted file mode 100644 index 4c81e293b..000000000 --- a/docker-compose.browser.yml +++ /dev/null @@ -1,78 +0,0 @@ -# Agent Relay - Browser Testing Workspace -# -# Extends docker-compose.dev.yml with browser testing capabilities. -# -# Usage: -# docker compose -f docker-compose.dev.yml -f docker-compose.browser.yml up -# -# Access: -# - Dashboard: http://localhost:3888 -# - VNC (web): http://localhost:6080/vnc.html -# - VNC (native): vnc://localhost:5900 - -version: '3.8' - -services: - # Browser-enabled workspace with full testing capabilities - workspace-browser: - build: - context: . - dockerfile: deploy/workspace/Dockerfile.browser - ports: - - "3888:3888" # Dashboard/API - - "3889:3889" # WebSocket - - "5900:5900" # VNC direct - - "6080:6080" # noVNC web interface - environment: - WORKSPACE_ID: browser-workspace - SUPERVISOR_ENABLED: "true" - MAX_AGENTS: "10" - # Browser display settings - DISPLAY: ":99" - SCREEN_WIDTH: "1920" - SCREEN_HEIGHT: "1080" - SCREEN_DEPTH: "24" - # VNC settings - VNC_ENABLED: "true" - VNC_PORT: "5900" - NOVNC_ENABLED: "true" - NOVNC_PORT: "6080" - volumes: - # Persistent data - - workspace_browser_data:/data - # Mount repos - - ./:/workspace/relay:ro - # Docker socket for spawning containers - - /var/run/docker.sock:/var/run/docker.sock - # Required for some browser operations - shm_size: '2gb' - # Security options for browser sandboxing - security_opt: - - seccomp:unconfined - depends_on: - - cloud - - # Alternative: Rootless Docker-in-Docker workspace - # Uses sysbox runtime for secure nested containers - workspace-dind: - build: - context: . - dockerfile: deploy/workspace/Dockerfile.browser - runtime: sysbox-runc # Requires sysbox installed on host - ports: - - "3898:3888" - - "6090:6080" - environment: - WORKSPACE_ID: dind-workspace - SUPERVISOR_ENABLED: "true" - MAX_AGENTS: "10" - # DinD mode - Docker daemon runs inside container - DOCKER_HOST: "unix:///var/run/docker.sock" - volumes: - - workspace_dind_data:/data - profiles: - - dind # Only start with: --profile dind - -volumes: - workspace_browser_data: - workspace_dind_data: diff --git a/docker-compose.test.yml b/docker-compose.test.yml deleted file mode 100644 index a4c990c53..000000000 --- a/docker-compose.test.yml +++ /dev/null @@ -1,202 +0,0 @@ -# Agent Relay Cloud - Full QA Test Environment -# Run with: docker compose -f docker-compose.test.yml up --build -# -# This environment simulates the full cloud stack with: -# - PostgreSQL database -# - Redis for sessions/pub-sub -# - Cloud API server -# - Simulated daemon(s) that report metrics -# - Test runner for integration tests -# -# Usage: -# # Start the full stack -# docker compose -f docker-compose.test.yml up -d -# -# # Run integration tests -# docker compose -f docker-compose.test.yml run test-runner -# -# # View logs -# docker compose -f docker-compose.test.yml logs -f -# -# # Tear down -# docker compose -f docker-compose.test.yml down -v - -version: '3.8' - -services: - # PostgreSQL database - postgres: - image: postgres:16-alpine - environment: - POSTGRES_USER: agent_relay - POSTGRES_PASSWORD: test_password - POSTGRES_DB: agent_relay_test - ports: - - "5433:5432" - volumes: - - postgres_test_data:/var/lib/postgresql/data - healthcheck: - test: ["CMD-SHELL", "pg_isready -U agent_relay"] - interval: 2s - timeout: 5s - retries: 10 - - # Redis for sessions and pub/sub - redis: - image: redis:7-alpine - ports: - - "6380:6379" - healthcheck: - test: ["CMD", "redis-cli", "ping"] - interval: 2s - timeout: 5s - retries: 10 - - # Cloud API server - cloud: - build: - context: . - dockerfile: Dockerfile - ports: - - "3100:3000" - environment: - NODE_ENV: test - PORT: 3000 - PUBLIC_URL: http://localhost:3100 - - # Database - DATABASE_URL: postgres://agent_relay:test_password@postgres:5432/agent_relay_test - REDIS_URL: redis://redis:6379 - - # Session - SESSION_SECRET: test-session-secret - - # Vault master key (test only) - "test-vault-key-32-bytes-testing!" = 32 bytes - VAULT_MASTER_KEY: dGVzdC12YXVsdC1rZXktMzItYnl0ZXMtdGVzdGluZyE= - - # Disable external services in test mode - STRIPE_SECRET_KEY: sk_test_placeholder - STRIPE_PUBLISHABLE_KEY: pk_test_placeholder - STRIPE_WEBHOOK_SECRET: whsec_test - - # Compute provider (docker for local) - COMPUTE_PROVIDER: docker - - # Enable memory monitoring - RELAY_MEMORY_MONITORING: "true" - RELAY_CLOUD_ENABLED: "true" - depends_on: - postgres: - condition: service_healthy - redis: - condition: service_healthy - volumes: - - /var/run/docker.sock:/var/run/docker.sock - healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:3000/health"] - interval: 5s - timeout: 5s - retries: 10 - - # Simulated daemon 1 - Reports metrics to cloud - daemon-simulator-1: - build: - context: . - dockerfile: test/cloud/Dockerfile.daemon-simulator - environment: - DAEMON_NAME: test-daemon-1 - CLOUD_API_URL: http://cloud:3000 - SIMULATOR_MODE: "true" - AGENT_COUNT: "3" - REPORT_INTERVAL_MS: "5000" - # Simulate some memory issues - SIMULATE_MEMORY_GROWTH: "true" - SIMULATE_CRASH: "false" - depends_on: - cloud: - condition: service_healthy - restart: on-failure - - # Simulated daemon 2 - Normal operation - daemon-simulator-2: - build: - context: . - dockerfile: test/cloud/Dockerfile.daemon-simulator - environment: - DAEMON_NAME: test-daemon-2 - CLOUD_API_URL: http://cloud:3000 - SIMULATOR_MODE: "true" - AGENT_COUNT: "2" - REPORT_INTERVAL_MS: "5000" - SIMULATE_MEMORY_GROWTH: "false" - SIMULATE_CRASH: "false" - depends_on: - cloud: - condition: service_healthy - restart: on-failure - - # Simulated daemon 3 - Crash simulation - daemon-simulator-crash: - build: - context: . - dockerfile: test/cloud/Dockerfile.daemon-simulator - environment: - DAEMON_NAME: test-daemon-crash - CLOUD_API_URL: http://cloud:3000 - SIMULATOR_MODE: "true" - AGENT_COUNT: "1" - REPORT_INTERVAL_MS: "3000" - SIMULATE_MEMORY_GROWTH: "false" - SIMULATE_CRASH: "true" - CRASH_AFTER_SECONDS: "30" - depends_on: - cloud: - condition: service_healthy - profiles: - - crash-test - - # Integration test runner - test-runner: - build: - context: . - dockerfile: test/cloud/Dockerfile.test-runner - environment: - CLOUD_API_URL: http://cloud:3000 - DATABASE_URL: postgres://agent_relay:test_password@postgres:5432/agent_relay_test - REDIS_URL: redis://redis:6379 - TEST_TIMEOUT: "60000" - depends_on: - cloud: - condition: service_healthy - daemon-simulator-1: - condition: service_started - daemon-simulator-2: - condition: service_started - volumes: - - ./test:/app/test:ro - - ./src:/app/src:ro - - test_results:/app/test-results - profiles: - - test - - # WebSocket test client - ws-test-client: - build: - context: . - dockerfile: test/cloud/Dockerfile.ws-client - environment: - CLOUD_WS_URL: ws://cloud:3000/ws - TEST_DURATION_SECONDS: "60" - depends_on: - cloud: - condition: service_healthy - profiles: - - ws-test - -volumes: - postgres_test_data: - test_results: - -networks: - default: - name: agent-relay-test diff --git a/docs/authentication.md b/docs/authentication.md deleted file mode 100644 index 7925c91b4..000000000 --- a/docs/authentication.md +++ /dev/null @@ -1,3 +0,0 @@ -# Authentication - -See [the CLI reference](reference-cli.md) for current authentication commands and provider login flows. diff --git a/docs/cli-cloud-commands.md b/docs/cli-cloud-commands.md deleted file mode 100644 index 2b0b9df67..000000000 --- a/docs/cli-cloud-commands.md +++ /dev/null @@ -1,3 +0,0 @@ -# Cloud Commands - -See [the CLI reference](reference-cli.md) for current `agent-relay cloud` commands and flags. diff --git a/docs/cli-messaging.md b/docs/cli-messaging.md deleted file mode 100644 index 091619802..000000000 --- a/docs/cli-messaging.md +++ /dev/null @@ -1,77 +0,0 @@ -Once the broker is up, the CLI can act as a lightweight operator console for human-to-agent messages and recent conversation history. - -## Send a message - -```bash -agent-relay send reviewer "Please summarize the riskiest changes first." -``` - -The target argument accepts: - -- an agent name such as `reviewer` -- a channel such as `#general` -- `*` for broadcast - -Optional flags: - -- `--from ` sets the sender identity. Defaults to `$AGENT_RELAY_ORCHESTRATOR_NAME` or `orchestrator`. - Workers' replies are addressed to this name, so use a stable value you can read with `agent-relay replies `. -- `--thread ` keeps follow-ups grouped under an existing thread. - -## Read recent history - -```bash -agent-relay history --to '#general' --since 30m -``` - -Useful filters: - -- `--from ` keeps only one sender. -- `--to ` narrows to a target. When `` is not a channel, the command prints messages in chronological order with no preview truncation; pair with `--from ` to filter by sender. - For example, `agent-relay history --to Worker2 --from Worker2` is equivalent to `agent-relay replies Worker2` for the non-`--unread` case. -- `--since