Factory: go serverless via relayfile webhook delivery + remove polling loops

## Background

The factory pipeline (Linear intent → Slack clarification → GitHub PR) is currently implemented as a long-running daemon process spawned by Electron. The architecture already has the right abstractions (`FactoryPorts`, clean `LinearWriteback`/`SlackWriteback`/`GithubRead` interfaces, `RelayfileCloudMountClient`), but the process model, polling loops, hardcoded service coupling, and lack of an onboarding flow are the things blocking a production-grade, multi-tenant cloud deployment.

---

## 1. Remove polling loops (they're already redundant)

`RelayFileSync` from `@relayfile/sdk` is a WebSocket push stream. When `baseUrl` (`handle.info.relayfileUrl`) is present, the factory already receives push events for:

- `/linear/issues/**` — issue state changes
- `/slack/channels/**` — Slack messages  
- `/github/repos/**` — PR state

The polling loops in `factory.ts` are defensive fallbacks that shouldn't be the primary model:

| Loop | Interval | Status |
|------|----------|--------|
| Event-client poll | 5s | Fallback after 5 WebSocket errors — fine to keep as safety net |
| Slack reply poll | 5s | Second fallback if `mount.subscribe()` throws — same |
| **PR completion sweep** | **15s** | **Eliminate if relayfile reliably pushes `/github/repos/**` on draft→ready transitions** |

---

## 2. The key unlock: relayfile webhook delivery

**Does relayfile support HTTP webhook delivery (POST to an endpoint) in addition to WebSocket?**

If yes, the factory becomes a pure serverless app with no persistent connections:

```
relayfile HTTP webhook → API Gateway / Cloudflare Worker → handler
  /linear/issues/** updated  → triage() + dispatch()
  /slack/channels/** updated → route clarification to agent
  /github/repos/** updated   → completeIssue() if PR non-draft
```

If relayfile only supports WebSocket, the fallback is a single lightweight always-on relay worker that maintains the WebSocket and fans out to handlers per event.

**This is the most important thing to verify/build in relayfile.**

---

## 3. Serverless architecture

```
Today:
  Electron → FactoryManager → factory daemon (long-running local process)

Target:
  relayfile webhook → serverless handler (stateless) → durable state → FleetClient (remote agents)
```

State that moves from in-memory to durable storage:
- `BatchTracker` (in-flight issues, queued issues, agent→issue map)
- `InFlightRegistry` (spawned agent records)
- Slack "waiting-for-clarification" state (currently held open in `#watchSlackThread`)

The Slack wait becomes a state machine:

```
dispatch() → low confidence triage
  → write {issueId, slackThreadId, status: 'waiting-clarification'} to durable state
  → handler exits

Slack reply webhook arrives → new handler invocation
  → look up record by threadId → inject answer → spawn agents
```

---

## 4. Multi-cloud: AWS and Cloudflare (to start)

The `StateStore` port must be cloud-agnostic. Factory core has zero cloud-specific imports.

### AWS
- **Handler**: Lambda (Node.js runtime)
- **State**: DynamoDB
- **Ingress**: API Gateway → Lambda
- **Cron**: EventBridge Scheduler

### Cloudflare
- **Handler**: Cloudflare Workers
- **State**: Durable Objects (strongly-consistent batch tracking) + KV (read-heavy lookups)
- **Ingress**: Workers route — no API Gateway needed
- **Cron**: Workers Cron Triggers

### StateStore port

```typescript
interface StateStore {
  getInFlight(issueId: string): Promise<InFlightRecord | null>
  putInFlight(record: InFlightRecord): Promise<void>
  deleteInFlight(issueId: string): Promise<void>
  listInFlight(): Promise<InFlightRecord[]>
  getWaitingClarification(threadId: string): Promise<ClarificationRecord | null>
  putWaitingClarification(record: ClarificationRecord): Promise<void>
  deleteWaitingClarification(threadId: string): Promise<void>
}
// Impls: DynamoStateStore, DurableObjectStateStore, InMemoryStateStore
```

Handler entry points are thin adapters:

```typescript
// aws/handler.ts
export const handler = async (event: APIGatewayProxyEvent) => {
  const factory = buildFactory({ store: new DynamoStateStore(), fleet: ... })
  return factory.handleWebhook(event.body)
}

// cloudflare/worker.ts
export default {
  fetch(request: Request, env: Env) {
    const factory = buildFactory({ store: new DurableObjectStateStore(env.STATE), fleet: ... })
    return factory.handleWebhook(request)
  }
}
```

---

## 5. Provider abstraction (remove hardcoded Linear/Slack/GitHub coupling)

The factory core is currently coupled to three specific services at every level. Coupling lives in: mount paths (`/linear/issues/**`), the `LinearIssue` type, Linear state UUIDs, Slack timestamp format, GitHub `isDraft` field, and dispatch templates (`gh pr create`, `AgentWorkforce/${repo}`).

### WorkItem — replace LinearIssue everywhere

```typescript
interface WorkItem {
  id: string
  title: string
  description: string
  state: string                      // abstract name, not a provider UUID
  labels: string[]
  source: { provider: string; externalId: string; url?: string }
  metadata: Record<string, unknown>  // provider-specific extras pass through opaquely
}
```

### Role-based ports — rename from provider names to roles

```typescript
// Was: LinearWriteback
interface WorkItemProvider {
  subscribe(onChange: (item: WorkItem) => void): Subscription
  setState(id: string, state: 'dispatched' | 'done'): Promise<void>
  postComment(id: string, body: string): Promise<void>
  isReadyForDispatch(item: WorkItem): boolean
}

// Was: SlackWriteback
interface ClarificationChannel {
  openThread(context: { title: string; body: string }): Promise<string>
  reply(threadId: string, text: string): Promise<void>
  subscribe(threadId: string, onReply: (text: string) => void): Subscription
}

// Was: GithubRead + GithubMergeGate
interface OutputTarget {
  isPrComplete(workItemId: string): Promise<boolean>
}
```

Concrete impls: `LinearWorkItemProvider`, `SlackClarificationChannel`, `GithubOutputTarget`. Adding Jira means writing `JiraWorkItemProvider` — nothing else in the factory changes.

### Pluggable task template — replace hardcoded dispatch strings

The dispatch template today hardcodes `gh pr create --base main`, `AgentWorkforce/${repo}`, `Linear issue: ${key}`. Make it injectable:

```typescript
interface TaskTemplate {
  render(item: WorkItem, route: Route, opts: TemplateOpts): string
}
```

Mount paths (`/linear/issues/**` etc.) move into each provider impl — the factory core only sees `WorkItem` events from a subscription, never raw paths.

---

## 6. Onboarding: config-driven connect flow

Right now there is no onboarding — the factory assumes all integrations are already connected. For a proper cloud product, declaring a provider in config should trigger a connect flow automatically.

### Desired behavior

```yaml
# factory.config.yaml
intentProvider: linear
clarificationChannel: slack
outputTarget: github
```

Running `factory init` (or deploying for the first time) checks connection state for each declared provider and kicks off the appropriate auth flow for anything not yet connected.

### Connection state model

```typescript
type ConnectionStatus = 'connected' | 'needs-auth' | 'missing-scopes' | 'error'

interface ProviderConnection {
  provider: string
  status: ConnectionStatus
  requiredScopes: string[]
  connectedScopes?: string[]
  authUrl?: string        // populated when status === 'needs-auth'
  error?: string
}
```

### Connect flow

1. **`factory check`** — reads config, calls `checkConnection()` on each declared provider impl, prints a status table:
   ```
   linear       ✓ connected
   slack        ✗ needs-auth  → run: factory connect slack
   github       ✓ connected
   ```

2. **`factory connect <provider>`** — opens OAuth flow (browser redirect or device code), stores token via relayfile credential store, re-runs `checkConnection()` to confirm

3. **On startup** — factory refuses to start if any declared provider returns `needs-auth` or `missing-scopes`, printing actionable error + connect command

4. **Provider interface gains a connect method**:
   ```typescript
   interface WorkItemProvider {
     // ...existing methods
     checkConnection(): Promise<ProviderConnection>
     connect(opts: { interactive: boolean }): Promise<void>
   }
   ```
   Same pattern for `ClarificationChannel` and `OutputTarget`.

### Scope enforcement

Each provider impl declares its required scopes. The connect flow requests exactly those. On startup, `checkConnection()` diffs connected scopes against required — `missing-scopes` means re-run connect to upgrade permissions.

---

## Extraction readiness

`packages/factory-sdk` is already structurally well-isolated. Remaining work:

1. **`WorkItem` type + provider role interfaces** — decouple core from Linear/Slack/GitHub types
2. **`StateStore` port** — in-memory + DynamoDB + Durable Objects impls
3. **Onboarding: `checkConnection` + connect flow** — per-provider, triggered by config
4. **Make heartbeat optional** — Electron-specific; replace with `/healthz` or remove
5. **Delete `FactoryManager`** — thin Electron IPC glue, not needed in cloud
6. **Remove PR sweep** — once relayfile push confirmed reliable for GitHub events
7. **Pluggable `TaskTemplate`** — remove hardcoded org/tool references from dispatch
8. **Cloud adapter packages** — `factory-sdk-aws`, `factory-sdk-cloudflare`

**Estimate: 2–3 weeks** to reach clean extraction, multi-cloud, and onboarding flow.

---

## Action items

- [ ] Verify/add HTTP webhook delivery mode in relayfile (critical path)
- [ ] Confirm relayfile pushes `/github/repos/**` on PR draft→ready transitions
- [ ] Define `WorkItem`, `WorkItemProvider`, `ClarificationChannel`, `OutputTarget` interfaces
- [ ] Migrate `LinearIssue` → `WorkItem` throughout factory core
- [ ] Add `StateStore` port with in-memory + DynamoDB + Durable Objects impls
- [ ] Implement `checkConnection()` + `connect()` on all provider interfaces
- [ ] Build `factory check` and `factory connect <provider>` CLI commands
- [ ] Add startup connection guard (refuse to run with unconnected providers)
- [ ] Make heartbeat/registry optional
- [ ] Remove `#sweepPrStateCompletions` once relayfile push confirmed
- [ ] Implement cloud handler entry points (`factory-sdk-aws`, `factory-sdk-cloudflare`)
- [ ] Publish `factory-sdk` as standalone package once second consumer exists


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Factory: go serverless via relayfile webhook delivery + remove polling loops #347

Background

1. Remove polling loops (they're already redundant)

2. The key unlock: relayfile webhook delivery

3. Serverless architecture

4. Multi-cloud: AWS and Cloudflare (to start)

AWS

Cloudflare

StateStore port

5. Provider abstraction (remove hardcoded Linear/Slack/GitHub coupling)

WorkItem — replace LinearIssue everywhere

Role-based ports — rename from provider names to roles

Pluggable task template — replace hardcoded dispatch strings

6. Onboarding: config-driven connect flow

Desired behavior

Connection state model

Connect flow

Scope enforcement

Extraction readiness

Action items

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Loop	Interval	Status
Event-client poll	5s	Fallback after 5 WebSocket errors — fine to keep as safety net
Slack reply poll	5s	Second fallback if `mount.subscribe()` throws — same
PR completion sweep	15s	Eliminate if relayfile reliably pushes `/github/repos/` on draft→ready transitions**

Uh oh!

Factory: go serverless via relayfile webhook delivery + remove polling loops #347

Description

Background

1. Remove polling loops (they're already redundant)

2. The key unlock: relayfile webhook delivery

3. Serverless architecture

4. Multi-cloud: AWS and Cloudflare (to start)

AWS

Cloudflare

StateStore port

5. Provider abstraction (remove hardcoded Linear/Slack/GitHub coupling)

WorkItem — replace LinearIssue everywhere

Role-based ports — rename from provider names to roles

Pluggable task template — replace hardcoded dispatch strings

6. Onboarding: config-driven connect flow

Desired behavior

Connection state model

Connect flow

Scope enforcement

Extraction readiness

Action items

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions