Skip to content

Factory: go serverless via relayfile webhook delivery + remove polling loops #347

Description

@khaliqgant

Background

The factory pipeline (Linear intent → Slack clarification → GitHub PR) is currently implemented as a long-running daemon process spawned by Electron. The architecture already has the right abstractions (FactoryPorts, clean LinearWriteback/SlackWriteback/GithubRead interfaces, RelayfileCloudMountClient), but the process model, polling loops, hardcoded service coupling, and lack of an onboarding flow are the things blocking a production-grade, multi-tenant cloud deployment.


1. Remove polling loops (they're already redundant)

RelayFileSync from @relayfile/sdk is a WebSocket push stream. When baseUrl (handle.info.relayfileUrl) is present, the factory already receives push events for:

  • /linear/issues/** — issue state changes
  • /slack/channels/** — Slack messages
  • /github/repos/** — PR state

The polling loops in factory.ts are defensive fallbacks that shouldn't be the primary model:

Loop Interval Status
Event-client poll 5s Fallback after 5 WebSocket errors — fine to keep as safety net
Slack reply poll 5s Second fallback if mount.subscribe() throws — same
PR completion sweep 15s Eliminate if relayfile reliably pushes /github/repos/** on draft→ready transitions

2. The key unlock: relayfile webhook delivery

Does relayfile support HTTP webhook delivery (POST to an endpoint) in addition to WebSocket?

If yes, the factory becomes a pure serverless app with no persistent connections:

relayfile HTTP webhook → API Gateway / Cloudflare Worker → handler
  /linear/issues/** updated  → triage() + dispatch()
  /slack/channels/** updated → route clarification to agent
  /github/repos/** updated   → completeIssue() if PR non-draft

If relayfile only supports WebSocket, the fallback is a single lightweight always-on relay worker that maintains the WebSocket and fans out to handlers per event.

This is the most important thing to verify/build in relayfile.


3. Serverless architecture

Today:
  Electron → FactoryManager → factory daemon (long-running local process)

Target:
  relayfile webhook → serverless handler (stateless) → durable state → FleetClient (remote agents)

State that moves from in-memory to durable storage:

  • BatchTracker (in-flight issues, queued issues, agent→issue map)
  • InFlightRegistry (spawned agent records)
  • Slack "waiting-for-clarification" state (currently held open in #watchSlackThread)

The Slack wait becomes a state machine:

dispatch() → low confidence triage
  → write {issueId, slackThreadId, status: 'waiting-clarification'} to durable state
  → handler exits

Slack reply webhook arrives → new handler invocation
  → look up record by threadId → inject answer → spawn agents

4. Multi-cloud: AWS and Cloudflare (to start)

The StateStore port must be cloud-agnostic. Factory core has zero cloud-specific imports.

AWS

  • Handler: Lambda (Node.js runtime)
  • State: DynamoDB
  • Ingress: API Gateway → Lambda
  • Cron: EventBridge Scheduler

Cloudflare

  • Handler: Cloudflare Workers
  • State: Durable Objects (strongly-consistent batch tracking) + KV (read-heavy lookups)
  • Ingress: Workers route — no API Gateway needed
  • Cron: Workers Cron Triggers

StateStore port

interface StateStore {
  getInFlight(issueId: string): Promise<InFlightRecord | null>
  putInFlight(record: InFlightRecord): Promise<void>
  deleteInFlight(issueId: string): Promise<void>
  listInFlight(): Promise<InFlightRecord[]>
  getWaitingClarification(threadId: string): Promise<ClarificationRecord | null>
  putWaitingClarification(record: ClarificationRecord): Promise<void>
  deleteWaitingClarification(threadId: string): Promise<void>
}
// Impls: DynamoStateStore, DurableObjectStateStore, InMemoryStateStore

Handler entry points are thin adapters:

// aws/handler.ts
export const handler = async (event: APIGatewayProxyEvent) => {
  const factory = buildFactory({ store: new DynamoStateStore(), fleet: ... })
  return factory.handleWebhook(event.body)
}

// cloudflare/worker.ts
export default {
  fetch(request: Request, env: Env) {
    const factory = buildFactory({ store: new DurableObjectStateStore(env.STATE), fleet: ... })
    return factory.handleWebhook(request)
  }
}

5. Provider abstraction (remove hardcoded Linear/Slack/GitHub coupling)

The factory core is currently coupled to three specific services at every level. Coupling lives in: mount paths (/linear/issues/**), the LinearIssue type, Linear state UUIDs, Slack timestamp format, GitHub isDraft field, and dispatch templates (gh pr create, AgentWorkforce/${repo}).

WorkItem — replace LinearIssue everywhere

interface WorkItem {
  id: string
  title: string
  description: string
  state: string                      // abstract name, not a provider UUID
  labels: string[]
  source: { provider: string; externalId: string; url?: string }
  metadata: Record<string, unknown>  // provider-specific extras pass through opaquely
}

Role-based ports — rename from provider names to roles

// Was: LinearWriteback
interface WorkItemProvider {
  subscribe(onChange: (item: WorkItem) => void): Subscription
  setState(id: string, state: 'dispatched' | 'done'): Promise<void>
  postComment(id: string, body: string): Promise<void>
  isReadyForDispatch(item: WorkItem): boolean
}

// Was: SlackWriteback
interface ClarificationChannel {
  openThread(context: { title: string; body: string }): Promise<string>
  reply(threadId: string, text: string): Promise<void>
  subscribe(threadId: string, onReply: (text: string) => void): Subscription
}

// Was: GithubRead + GithubMergeGate
interface OutputTarget {
  isPrComplete(workItemId: string): Promise<boolean>
}

Concrete impls: LinearWorkItemProvider, SlackClarificationChannel, GithubOutputTarget. Adding Jira means writing JiraWorkItemProvider — nothing else in the factory changes.

Pluggable task template — replace hardcoded dispatch strings

The dispatch template today hardcodes gh pr create --base main, AgentWorkforce/${repo}, Linear issue: ${key}. Make it injectable:

interface TaskTemplate {
  render(item: WorkItem, route: Route, opts: TemplateOpts): string
}

Mount paths (/linear/issues/** etc.) move into each provider impl — the factory core only sees WorkItem events from a subscription, never raw paths.


6. Onboarding: config-driven connect flow

Right now there is no onboarding — the factory assumes all integrations are already connected. For a proper cloud product, declaring a provider in config should trigger a connect flow automatically.

Desired behavior

# factory.config.yaml
intentProvider: linear
clarificationChannel: slack
outputTarget: github

Running factory init (or deploying for the first time) checks connection state for each declared provider and kicks off the appropriate auth flow for anything not yet connected.

Connection state model

type ConnectionStatus = 'connected' | 'needs-auth' | 'missing-scopes' | 'error'

interface ProviderConnection {
  provider: string
  status: ConnectionStatus
  requiredScopes: string[]
  connectedScopes?: string[]
  authUrl?: string        // populated when status === 'needs-auth'
  error?: string
}

Connect flow

  1. factory check — reads config, calls checkConnection() on each declared provider impl, prints a status table:

    linear       ✓ connected
    slack        ✗ needs-auth  → run: factory connect slack
    github       ✓ connected
    
  2. factory connect <provider> — opens OAuth flow (browser redirect or device code), stores token via relayfile credential store, re-runs checkConnection() to confirm

  3. On startup — factory refuses to start if any declared provider returns needs-auth or missing-scopes, printing actionable error + connect command

  4. Provider interface gains a connect method:

    interface WorkItemProvider {
      // ...existing methods
      checkConnection(): Promise<ProviderConnection>
      connect(opts: { interactive: boolean }): Promise<void>
    }

    Same pattern for ClarificationChannel and OutputTarget.

Scope enforcement

Each provider impl declares its required scopes. The connect flow requests exactly those. On startup, checkConnection() diffs connected scopes against required — missing-scopes means re-run connect to upgrade permissions.


Extraction readiness

packages/factory-sdk is already structurally well-isolated. Remaining work:

  1. WorkItem type + provider role interfaces — decouple core from Linear/Slack/GitHub types
  2. StateStore port — in-memory + DynamoDB + Durable Objects impls
  3. Onboarding: checkConnection + connect flow — per-provider, triggered by config
  4. Make heartbeat optional — Electron-specific; replace with /healthz or remove
  5. Delete FactoryManager — thin Electron IPC glue, not needed in cloud
  6. Remove PR sweep — once relayfile push confirmed reliable for GitHub events
  7. Pluggable TaskTemplate — remove hardcoded org/tool references from dispatch
  8. Cloud adapter packagesfactory-sdk-aws, factory-sdk-cloudflare

Estimate: 2–3 weeks to reach clean extraction, multi-cloud, and onboarding flow.


Action items

  • Verify/add HTTP webhook delivery mode in relayfile (critical path)
  • Confirm relayfile pushes /github/repos/** on PR draft→ready transitions
  • Define WorkItem, WorkItemProvider, ClarificationChannel, OutputTarget interfaces
  • Migrate LinearIssueWorkItem throughout factory core
  • Add StateStore port with in-memory + DynamoDB + Durable Objects impls
  • Implement checkConnection() + connect() on all provider interfaces
  • Build factory check and factory connect <provider> CLI commands
  • Add startup connection guard (refuse to run with unconnected providers)
  • Make heartbeat/registry optional
  • Remove #sweepPrStateCompletions once relayfile push confirmed
  • Implement cloud handler entry points (factory-sdk-aws, factory-sdk-cloudflare)
  • Publish factory-sdk as standalone package once second consumer exists

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions