feat(sdk-swift): add initial Swift SDK and harden workflow output#589
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an initial Swift SDK package (packages/sdk-swift/) and updates the TypeScript workflow tooling to provide a polished CLI renderer (listr2 + chalk), plus a CI workflow that validates/dry-runs changed workflow files.
Changes:
- Added SwiftPM package scaffold + core Swift types/transport/API + minimal XCTest coverage.
- Implemented a listr2-based workflow renderer and updated workflow runner/CLI output to use chalk styling.
- Added GitHub Actions workflow to validate and dry-run changed workflow definitions under
workflows/**.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
packages/sdk-swift/Package.swift |
New SwiftPM manifest for the Swift SDK package. |
packages/sdk-swift/README.md |
SDK readme + install/quickstart snippet. |
packages/sdk-swift/Sources/AgentRelaySDK/RelayTypes.swift |
Swift Codable wire types for messages/events. |
packages/sdk-swift/Sources/AgentRelaySDK/RelayTransport.swift |
URLSessionWebSocketTask-based transport with reconnect/ping. |
packages/sdk-swift/Sources/AgentRelaySDK/RelayCast.swift |
Public-facing Swift API (RelayCast, Channel, AgentClient). |
packages/sdk-swift/Tests/AgentRelaySDKTests/AgentRelaySDKTests.swift |
Minimal tests for init + channel creation. |
packages/sdk/src/workflows/listr-renderer.ts |
New reusable renderer (createWorkflowRenderer) for listr2-based event rendering. |
packages/sdk/src/workflows/cli.ts |
CLI output changed from raw event logging to listr2 rendering + chalk styling. |
packages/sdk/src/workflows/runner.ts |
Chalk styling added to workflow/broker prefixes and run summary icons/borders. |
packages/sdk/src/workflows/index.ts |
Exported createWorkflowRenderer. |
packages/sdk/package.json |
Added chalk + listr2 dependencies. |
workflows/test-output.ts |
New workflow script to smoke-test renderer output. |
workflows/polish-workflow-output.ts |
New workflow script that automates the listr2/chalk integration work. |
workflows/add-swift-sdk.ts |
New workflow script to generate/review/commit the Swift SDK with “durable output” checks. |
.github/workflows/workflow-validation.yml |
CI job to validate and dry-run changed workflow files in PRs. |
package.json |
Added chalk + listr2 at the repo root. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| function installOutputFilter(): () => void { | ||
| const orig = console.log.bind(console); | ||
| console.log = (...args: unknown[]) => { | ||
| const str = String(args[0] ?? ''); | ||
| // Always show the observer URL and channel so users can follow the run | ||
| if (str.includes('Observer:') || str.includes('agentrelay.dev') || str.includes('Channel: wf-')) { | ||
| orig(...args); | ||
| return; | ||
| } | ||
| // Block [broker] lines and [workflow HH:MM] timing lines | ||
| if (str.startsWith('[broker]') || /^\[workflow \d{2}:\d{2}\]/.test(str)) return; | ||
| orig(...args); |
| // Filter [broker] and [workflow HH:MM] noise while listr owns the terminal, | ||
| // but let the observer URL and channel name through. | ||
| function installOutputFilter(): () => void { | ||
| const orig = console.log.bind(console); | ||
| console.log = (...args: unknown[]) => { | ||
| const str = String(args[0] ?? ''); | ||
| if (str.includes('Observer:') || str.includes('agentrelay.dev') || str.includes('Channel: wf-')) { | ||
| orig(...args); | ||
| return; | ||
| } | ||
| } | ||
| if (str.startsWith('[broker]') || /^\[workflow \d{2}:\d{2}\]/.test(str)) return; | ||
| orig(...args); | ||
| }; | ||
| return () => { console.log = orig; }; |
| */ | ||
|
|
||
| const renderer = createWorkflowRenderer(); | ||
| const useRenderer = process.env.DRY_RUN !== '1' && process.stdout.isTTY; |
| import { workflow, createWorkflowRenderer } from '@agent-relay/sdk/workflows'; | ||
|
|
||
| const renderer = createWorkflowRenderer(); | ||
| const useRenderer = process.env.DRY_RUN !== '1' && process.stdout.isTTY; |
| import { workflow, createWorkflowRenderer } from '@agent-relay/sdk/workflows'; | ||
|
|
||
| const renderer = createWorkflowRenderer(); | ||
| const useRenderer = process.env.DRY_RUN !== '1' && process.stdout.isTTY; |
| Add the package in Swift Package Manager: | ||
|
|
||
| ```swift | ||
| .package(url: "https://github.com/AgentWorkforce/relay.git", branch: "feature/swift-sdk") |
| public func `as`(_ agentToken: String) -> AgentClient { | ||
| AgentClient(core: core, agentName: agentToken, token: agentToken) | ||
| } | ||
|
|
b6c2d8f to
0a2c878
Compare
|
Cleaned this PR up:
Most of the older review comments are now outdated because they were attached to files that are no longer part of this PR diff. |
| handshakeInFlight = true | ||
| try await transport.connect() | ||
| try await send(.hello(HelloPayload(clientName: "AgentRelaySDK.Swift", clientVersion: "0.1.0"))) | ||
| try await waitForHandshake() |
There was a problem hiding this comment.
🔴 handshakeInFlight left as true when transport.connect() or send() throws in ensureConnected()
If transport.connect() (line 87) or send(.hello(...)) (line 88) throws, the error propagates out of ensureConnected() but handshakeInFlight (set to true on line 85) is never reset to false. On the next call to ensureConnected(), the check at line 82 sees handshakeInFlight == true and enters waitForHandshake(), which registers a continuation with a 10-second timeout (RelayCast.swift:193). Since no connection was established and no hello_ack will arrive, the caller is blocked for the full 10-second timeout before the SDK can retry. This turns every transient connection failure into a guaranteed 10-second stall on the next attempt.
Trace of the failure scenario
ensureConnected()setshandshakeInFlight = true(line 85)transport.connect()throws (e.g. server unreachable)- Error propagates —
handshakeInFlightstaystrue - Next call to
ensureConnected()hits line 82-83, callswaitForHandshake() waitForHandshake()suspends for 10s waiting for ahello_ackthat never arrives- Timeout fires via
failHandshakeIfPending, setshandshakeInFlight = false - Only now can the next attempt proceed normally
Was this helpful? React with 👍 or 👎 to provide feedback.
| task.resume() | ||
| state = .connected | ||
| reconnectAttempt = 0 |
There was a problem hiding this comment.
🔴 Exponential backoff is broken — reconnectAttempt reset to 0 before connection is verified
In connect() at RelayTransport.swift:66, reconnectAttempt is unconditionally reset to 0 right after calling task.resume(), which merely starts the WebSocket connection asynchronously — it does not wait for the connection to actually succeed. When the connection subsequently fails, startReceiveLoop()'s error handler calls handleDisconnect() (RelayTransport.swift:194), which computes the delay from reconnectAttempt — but it's already 0 again. The result: every failed reconnection attempt uses a 500ms delay (reconnectDelay(for: 0)). The exponential backoff (1s, 2s, 4s, 8s, 16s, 30s) never kicks in, causing the client to hammer the server every ~500ms when it's unreachable.
Prompt for agents
In RelayTransport.swift, the `reconnectAttempt` counter on line 66 of `connect()` should NOT be reset to 0 immediately after `task.resume()`. Instead, it should be reset only after the connection is confirmed to be healthy (e.g., when `RelayCore` receives a `hello_ack` and calls a callback, or when the transport receives the first successful pong). One approach: remove `reconnectAttempt = 0` from `connect()` entirely, and instead add a new public method like `func resetReconnectCounter()` on `RelayTransport`, which `RelayCore.finishHandshake()` (the success path at RelayCast.swift:138) calls after a successful hello_ack. This ensures backoff delays keep increasing for repeatedly failing connections.
Was this helpful? React with 👍 or 👎 to provide feedback.
…nagement Add missing APIs needed for MSD Mac app integration: - brokerEvents stream: full BrokerEvent enum access for agent_spawned, worker_stream, delivery_*, etc. events - inboundMessages stream: raw protocol-level InboundMessage access - connectionState stream: connected/disconnected/reconnecting changes - spawnAgent/releaseAgent: agent lifecycle management from RelayCast - disconnect(): explicit disconnection with stream cleanup - API key in hello handshake for broker authentication - Forward-compatible unknown event handling: BrokerEvent.unknown and InboundMessage.unknown catch unrecognized types instead of throwing - Route deliver_relay frames to channel subscribers as RelayChannelEvents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ownload The e2e-tests workflow was failing with transient 504 errors when downloading the relay-dashboard binary from GitHub releases. Add retry logic with exponential backoff (up to 3 attempts) and fall back gracefully so the workflow continues even if the download fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4d5c338 to
8d6cfd2
Compare
| public var brokerEvents: AsyncStream<BrokerEvent> { | ||
| AsyncStream<BrokerEvent> { continuation in | ||
| Task { await core.registerBrokerEventContinuation(continuation) } | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 Race condition: brokerEvents, inboundMessages, and connectionState streams register continuations asynchronously via unstructured Task
The computed properties brokerEvents (RelayCast.swift:328-330), inboundMessages (RelayCast.swift:339-341), and connectionState (RelayCast.swift:346-348) all register their AsyncStream continuations with the core actor inside an unstructured Task { await core.register...Continuation(continuation) }. Because the Task runs asynchronously, there is a window between when the AsyncStream is returned to the caller and when the continuation is actually registered in the actor's arrays. Any events emitted during this window are silently dropped.
This is especially problematic for connectionState: a caller who does let states = relay.connectionState; try await channel.subscribe() would likely miss the initial .connected event. Compare with Channel which correctly captures its continuation synchronously in init and registers it synchronously in subscribe() (RelayCast.swift:365-380).
Was this helpful? React with 👍 or 👎 to provide feedback.
Allows consumers to depend on the relay repo directly via SPM: .package(url: "https://github.com/AgentWorkforce/relay.git", branch: "main") Points source/test paths to packages/sdk-swift/ subdirectory.
Move github.base_ref from direct ${{ }} interpolation in the run: block
to an env: variable, preventing potential script injection in the
workflow-validation CI step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nt handshakes Add a handshakeGeneration counter that increments each time a new handshake begins. The timeout task captures the current generation and only fails the handshake if the generation still matches, preventing orphaned timeouts from interfering with reconnect handshakes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(sdk-swift): add initial Swift SDK package * fix(workflow): require durable swift sdk output * fix(sdk-swift): address core review feedback * fix(sdk-swift): tighten handshake concurrency * feat(sdk-swift): expose broker events, connection state, and agent management Add missing APIs needed for MSD Mac app integration: - brokerEvents stream: full BrokerEvent enum access for agent_spawned, worker_stream, delivery_*, etc. events - inboundMessages stream: raw protocol-level InboundMessage access - connectionState stream: connected/disconnected/reconnecting changes - spawnAgent/releaseAgent: agent lifecycle management from RelayCast - disconnect(): explicit disconnection with stream cleanup - API key in hello handshake for broker authentication - Forward-compatible unknown event handling: BrokerEvent.unknown and InboundMessage.unknown catch unrecognized types instead of throwing - Route deliver_relay frames to channel subscribers as RelayChannelEvents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): add retry logic and graceful fallback for dashboard binary download The e2e-tests workflow was failing with transient 504 errors when downloading the relay-dashboard binary from GitHub releases. Add retry logic with exponential backoff (up to 3 attempts) and fall back gracefully so the workflow continues even if the download fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add root Package.swift exposing AgentRelaySDK Allows consumers to depend on the relay repo directly via SPM: .package(url: "https://github.com/AgentWorkforce/relay.git", branch: "main") Points source/test paths to packages/sdk-swift/ subdirectory. * fix(ci): use env var for github.base_ref to prevent script injection Move github.base_ref from direct ${{ }} interpolation in the run: block to an env: variable, preventing potential script injection in the workflow-validation CI step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(sdk-swift): prevent stale handshake timeout from failing subsequent handshakes Add a handshakeGeneration counter that increments each time a new handshake begins. The timeout task captures the current generation and only fails the handshake if the generation still matches, preventing orphaned timeouts from interfering with reconnect handshakes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Khaliq <khaliqgant@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR now has a cleaned history and diff based on current
main.It contains only two things:
packages/sdk-swift/workflows/add-swift-sdk.tsworkflow that is intended to generate/review/commit that SDK in future runsWhat changed during cleanup
The original branch accidentally carried extra workflow-renderer / CI-related history that had already moved on separately. The branch has now been rebuilt on top of current
main, and force-pushed so the PR only contains the Swift SDK work.As a result:
Included files
Swift SDK package
packages/sdk-swift/Package.swiftpackages/sdk-swift/README.mdpackages/sdk-swift/Sources/AgentRelaySDK/RelayTypes.swiftpackages/sdk-swift/Sources/AgentRelaySDK/RelayTransport.swiftpackages/sdk-swift/Sources/AgentRelaySDK/RelayCast.swiftpackages/sdk-swift/Tests/AgentRelaySDKTests/AgentRelaySDKTests.swiftWorkflow
workflows/add-swift-sdk.tsValidation notes
Local
swift buildon this machine was blocked by a host SwiftPM / CommandLineTools manifest-linker issue before meaningful package compilation:So the package is reviewable and committed, but further Swift validation may still need to happen in a healthier toolchain environment.