diff --git a/docs/agent-workflow.md b/docs/agent-workflow.md new file mode 100644 index 0000000..8eacc4a --- /dev/null +++ b/docs/agent-workflow.md @@ -0,0 +1,237 @@ +# Generic Agent Workflow — Mission Control Assignment Layer + +> **Issue:** [misospace/mission-control#59](https://github.com/misospace/mission-control/issues/59) +> **Date:** 2026-05-16 + +This document defines the operational workflow for any agent using Mission Control's assignment layer. +It is intentionally generic — no specific agent names or implementations are referenced. + +## Overview + +Mission Control provides a Postgres-backed cache of GitHub Issues that agents use to discover, claim, and track work. +The cache is refreshed periodically via sync endpoints; all state changes flow through the API, which writes back to GitHub. + +> **GitHub Issues and PRs remain the source of truth.** Mission Control's database is a cache, not authoritative storage. + +## Prerequisites + +- An agent identity string (e.g. `"agent-name"`). This maps to `agent/agent-name` labels on issues. +- A `MISSION_CONTROL_AGENT_TOKEN` environment variable with a valid bearer token for authenticated endpoints. +- The base URL of the Mission Control instance (e.g. `https://mc.example.com` or `http://localhost:3000`). + +## Complete Workflow + +### 1. Start a Run + +Before doing any work, start an agent run record. This creates an audit trail entry and enables visibility into agent activity. + +``` +POST /api/agent-runs +Authorization: Bearer +Content-Type: application/json +``` + +**Request body:** + +```json +{ + "agentName": "", + "runType": "heartbeat", + "status": "in-progress", + "startedAt": "2026-05-16T05:00:00.000Z" +} +``` + +**Required fields:** `agentName`, `runType`, `status`, `startedAt` +**Optional fields:** `finishedAt`, `summary`, `errorMessage`, `touchedIssueUrls`, `issueId` + +**Response:** `201 Created` with the created run object. + +### 2. Sync Issue State + +Refresh Mission Control's cache of GitHub Issues before selecting work. This fetches the latest issue state from GitHub. + +``` +POST /api/sync +Content-Type: application/json +``` + +**Auth:** None required (public endpoint). + +**Expected response:** `{ syncedCount: N }` where N is the number of issues synced. + +**Failure handling:** Treat any non-2xx, timeout, or network error as a freshness warning — log it and continue. **Do not fail the workflow on a sync failure.** + +### 3. Request Agent Queue + +Fetch the list of issues actionable for this agent, ranked by priority and status. + +``` +GET /api/agents//queue +``` + +**Auth:** None required (public endpoint). + +**Response:** Array of issue objects containing `number`, `title`, `url`, and `labels`. + +**Selection priority:** +1. Prefer issues labeled `agent/` if present. +2. Fall back to general backlog if no agent-specific label exists. +3. Treat "no status label" or `status/backlog` as backlog work — both are valid entry states. + +Issues with an existing `agent/` label remain visible in the queue so agents can see what others are working on. + +### 4. Claim Work + +Claim an issue by requesting an agent assignment through Mission Control. This adds an `agent/` label to the issue on GitHub and optionally moves it to `status/in-progress`. + +``` +POST /api/issues/claim +Authorization: Bearer +Content-Type: application/json +``` + +**Request body:** + +```json +{ + "issueId": "", + "repoFullName": "org/repo", + "issueNumber": 123, + "agentName": "", + "force": false +} +``` + +**Required fields:** `issueId`, `repoFullName`, `issueNumber`, `agentName` +**Optional fields:** `force` (boolean, default `false`) + +**Behavior:** +- **Normal claim (`force: false`):** Succeeds if the issue is open and not already assigned to another agent. Adds `agent/` label and optionally `status/in-progress`. +- **Force claim (`force: true`):** Removes any existing `agent/` label before adding the new one. Useful for reassignment. +- **Rejected (409):** Issue is already assigned to a different agent and `force` is not set. +- **Rejected (400):** Issue is closed or has `status/done`. + +**Response:** `{ success: true, labels: [...] }` on success. + +### 5. Report Run Status + +When the agent finishes its work cycle, report run status to close out the run record. + +``` +POST /api/agent-runs +Authorization: Bearer +Content-Type: application/json +``` + +**Request body:** + +```json +{ + "agentName": "", + "runType": "heartbeat", + "status": "completed", + "startedAt": "2026-05-16T05:00:00.000Z", + "finishedAt": "2026-05-16T05:30:00.000Z", + "summary": "Processed 3 issues, opened 1 PR", + "touchedIssueUrls": [ + "https://github.com/org/repo/issues/123", + "https://github.com/org/repo/pull/456" + ] +} +``` + +**Response:** `201 Created` with the created run object. + +### 6. Unclaim Work (Optional) + +Release an issue back to the pool when the agent can no longer work on it. + +``` +POST /api/issues/unclaim +Authorization: Bearer +Content-Type: application/json +``` + +**Request body:** + +```json +{ + "issueId": "", + "repoFullName": "org/repo", + "issueNumber": 123, + "agentName": "" +} +``` + +**Required fields:** `issueId`, `repoFullName`, `issueNumber`, `agentName` + +**Behavior:** +- Removes the `agent/` label from the issue on GitHub. +- Returns 400 if the agent is not assigned to this issue, or if it's closed/done. + +**Response:** `{ success: true, labels: [...] }` on success. + +### 7. Move Issue Status (Optional) + +Move an issue between board columns by updating its status label. + +``` +POST /api/issues/move +Authorization: Bearer +Content-Type: application/json +``` + +**Auth:** Bearer token required for agents. + +This endpoint writes to the audit log and updates both GitHub labels and the local cache. + +## Source of Truth Rules + +| Rule | Detail | +|------|--------| +| GitHub is authoritative | Issues and PRs on GitHub are the single source of truth. Mission Control's Postgres is a cache. | +| No direct DB writes | Never query or write to the Postgres cache directly — use the API. | +| No auto-close | Do not auto-close issues without explicit evidence of completion (green pipeline, merged PR, or human approval). | + +## Security Constraints + +| Constraint | Detail | +|------------|--------| +| Never log agent tokens | `MISSION_CONTROL_AGENT_TOKEN` must never be logged, echoed, or persisted to disk. | +| Never log GitHub tokens | Same constraint applies to all GitHub authentication tokens. | +| Audit trail required | Every state-changing move on Mission Control produces an `AuditLog` row. Operators trace agent activity through `/api/audit`. | + +## Failure Modes + +All Mission Control interactions are best-effort from the agent's perspective: + +1. **Sync failure** → Log warning, continue workflow. +2. **Run POST failure** → Log warning, continue. The run record is a visibility aid, not a gating dependency. +3. **Queue fetch failure** → Fall back to GitHub Issues API directly. +4. **Claim failure** → Retry with `force: true` if appropriate, or skip the issue and try the next one. +5. **Health check failure** → If `/api/health` returns `{ ok: false }` with 503, the database may be unreachable but Mission Control itself is still responsive. + +**Critical principle:** Mission Control failures must never crash agent heartbeat or workflow runs. The agent should always fall back to GitHub as the source of truth. + +## API Reference Summary + +| Endpoint | Method | Auth | Purpose | +|----------|--------|------|---------| +| `/api/health` | GET | None | Health check — `{ ok: true, database: "ok" }` | +| `/api/sync` | POST | None | Trigger issue sync from GitHub | +| `/api/issues` | GET | None | List all issues in Mission Control cache | +| `/api/agents//queue` | GET | None | Agent-specific issue queue | +| `/api/issues/claim` | POST | Bearer token | Claim an issue (adds agent label) | +| `/api/issues/unclaim` | POST | Bearer token | Release an issue (removes agent label) | +| `/api/issues/actions` | POST | None | Assign/unassign agent or owner labels | +| `/api/issues/unassign` | POST | None | Remove all agent/owner labels of a type | +| `/api/issues/move` | POST | Bearer token | Move an issue on the board | +| `/api/agent-runs` | GET | None | List recent agent runs | +| `/api/agent-runs` | POST | Bearer token | Submit a new agent run record | +| `/api/automation/repos` | GET | None | List tracked repositories | +| `/api/audit` | GET | None | Query audit log entries | + +## History + +- **2026-05-16** — Created as part of generic agent workflow documentation (Issue #59). Covers the complete lifecycle: start run, sync state, request queue, claim work, report status. Includes failure handling and security constraints. Replaces section-specific notes scattered across other docs with a single authoritative reference. diff --git a/docs/smoke-checklist.md b/docs/smoke-checklist.md new file mode 100644 index 0000000..f7e2354 --- /dev/null +++ b/docs/smoke-checklist.md @@ -0,0 +1,253 @@ +# Assignment-Layer Runtime Smoke Checklist + +> **Issue:** [misospace/mission-control#60](https://github.com/misospace/mission-control/issues/60) +> **Date:** 2026-05-16 +> **Purpose:** Verify Mission Control is healthy before agents rely on it for assignment. + +This checklist documents the runtime smoke checks an operator or agent should run against a Mission Control instance to confirm the assignment layer is fully operational. Each check maps to a specific API endpoint, UI page, or log signal. + +Run all checks against the target instance (local dev, staging, or production) before cutover or after any deployment. Mark each as **PASS**, **FAIL**, or **SKIP** (with justification). All 14 checks must pass — or be explicitly skipped with documented reason — before trusting Mission Control for assignment decisions. + +--- + +## Prerequisites + +- Mission Control instance is running and reachable at ``. +- At least one repository is tracked (`GET /api/automation/repos` returns items, or `GITHUB_REPOSITORIES` env var was set). +- At least one issue has been synced (`POST /api/sync` was run successfully at least once). +- A test agent identity is available (e.g. `"smoke-test"`). + +--- + +## Checklist + +### 1. Health endpoint returns healthy + +**Endpoint:** `GET /api/health` + +**Expected response:** +```json +{ + "ok": true, + "database": "ok", + "version": "0.1.13" +} +``` + +**Status code:** `200 OK` + +**Failure signal:** Any response with `ok: false`, `database: "error"`, or status `503`. This means the PostgreSQL database is unreachable but Mission Control itself is still running. + +--- + +### 2. Automation sync succeeds + +**Endpoint:** `POST /api/automation/sync` + +**Request body:** `{}` (syncs all tracked repos) or `{ "repo": "owner/repo" }` (single repo). + +**Expected response:** +```json +{ + "synced": 1, + "failed": 0, + "results": [{ "repo": "owner/repo", "result": { "success": true, "syncRunId": "" } }] +} +``` + +**Failure signal:** `synced: 0` with non-zero `failed`, or any HTTP error. + +--- + +### 3. Automation repos endpoint returns tracked repos + +**Endpoint:** `GET /api/automation/repos` + +**Expected response:** Array of repo objects, each containing `fullName`, `name`, `owner`, `defaultBranch`, `openPRCount`, `lastSyncedAt`. + +**Failure signal:** Empty array when at least one repo is expected, or HTTP error. + +--- + +### 4. Issue sync returns syncedCount > 0 + +**Endpoint:** `POST /api/sync` + +**Expected response:** +```json +{ "syncedCount": } +``` + +where `N > 0`. + +**Failure signal:** `syncedCount: 0` (no repos configured or sync failed), or HTTP error. + +--- + +### 5. Issues endpoint returns issues + +**Endpoint:** `GET /api/issues` + +**Expected response:** Non-empty array of issue objects, each with `number`, `title`, `url`, `labels`, `repository`. + +**Failure signal:** Empty array when issues are expected, or HTTP error. + +--- + +### 6. Board page shows issues + +**URL:** `/board` + +**Expected:** The Kanban board renders with issue cards grouped by status columns (`backlog`, `in-progress`, `in-review`, `done`). Filter bar is present and functional. Sync status indicator shows tracked repo count and cached issue count. + +**Failure signal:** Empty board, error overlay, or missing sync status indicator. + +--- + +### 7. Projects page shows repo groups + +**URL:** `/projects` + +**Expected:** The Projects view renders with cards grouped by repository (e.g., `misospace/mission-control`, `misospace/miso-gallery`). Each card shows issue count and status breakdown columns. + +**Failure signal:** Empty state message ("No issues have been synced yet") when issues are expected, or error overlay. + +--- + +### 8. Agent heartbeat/run events appear in Agents + +**URL:** `/agents` +**Endpoint (alternative):** `GET /api/agent-runs?limit=10` + +**Expected:** Recent agent run entries with `agentName`, `runType`, `status`, `createdAt`. At minimum, the page or API shows that agent runs are being recorded. + +**Failure signal:** Empty list when runs have been submitted, or HTTP error. + +--- + +### 9. Queue endpoint returns candidate issues + +**Endpoint:** `GET /api/agents//queue` + +**Expected response:** Non-empty array of issue objects with `number`, `title`, `url`, and `labels`. Issues should be ranked by priority and status according to the agent queue algorithm. + +**Failure signal:** Empty array when issues are expected, or HTTP error. + +--- + +### 10. Claiming a low-risk test issue updates GitHub labels + +**Endpoint:** `POST /api/issues/claim` + +**Request body:** +```json +{ + "issueId": "", + "repoFullName": "owner/repo", + "issueNumber": 123, + "agentName": "smoke-test", + "force": false +} +``` + +**Expected response:** `{ "success": true, "labels": ["...", "agent/smoke-test", ...] }` + +**GitHub verification:** The issue on GitHub should have the `agent/smoke-test` label added. Optionally `status/in-progress` if no status label was present. + +**Failure signal:** HTTP error, or labels not reflected on GitHub after a few seconds. + +--- + +### 11. Claiming writes an AuditLog entry + +**Endpoint:** `GET /api/audit?limit=50` + +**Expected response:** The most recent audit log entries include the claim action: +```json +{ + "action": "claim_issue", + "actor": "smoke-test", + "repoFullName": "owner/repo", + "issueNumber": 123, + "success": true, + "beforeLabels": ["..."], + "afterLabels": ["...", "agent/smoke-test"] +} +``` + +**Failure signal:** No `claim_issue` entry for the test claim in audit logs, or `success: false`. + +--- + +### 12. Unclaiming or reverting the test issue works + +**Endpoint:** `POST /api/issues/unclaim` + +**Request body:** +```json +{ + "issueId": "", + "repoFullName": "owner/repo", + "issueNumber": 123, + "agentName": "smoke-test" +} +``` + +**Expected response:** `{ "success": true, "labels": ["..."] }` (without `agent/smoke-test`). + +**GitHub verification:** The `agent/smoke-test` label is removed from the issue on GitHub. + +**Audit log verification:** A new `unclaim_issue` entry appears with `success: true`. + +**Failure signal:** HTTP error, label persists on GitHub, or no audit log entry. + +--- + +### 13. Logs show no Prisma, BigInt, or FK errors + +**Method:** Inspect Mission Control logs after running all checks above. + +**Expected:** No lines containing `PrismaClientKnownRequestError`, `PrismaClientUnknownRequestError`, `BigInt`, `ForeignKey`, `FK constraint`, `relation "X" does not exist`, or `column "Y" does not exist`. + +**Failure signal:** Any of the above error patterns in logs. These indicate schema drift, migration issues, or ORM misconfiguration that could silently corrupt data. + +--- + +### 14. Mission Control failures do not break agent runs + +**Method:** Simulate failure by stopping Mission Control or making an endpoint return errors, then verify an agent can continue operating using GitHub as fallback. + +**Steps:** +1. Stop Mission Control (or block network to it). +2. Attempt the heartbeat workflow: `POST /api/sync` fails, `GET /api/issues` fails. +3. Verify the agent falls back to GitHub Issues API directly and continues processing. +4. Restart Mission Control. Verify the next heartbeat succeeds with `/api/sync`. + +**Expected:** Agent continues working through fallback path; no crash or hang when Mission Control is unavailable. + +**Failure signal:** Agent crashes, hangs, or fails its heartbeat entirely when Mission Control is down. + +--- + +## Runbook: Interpreting Results + +| Result | Action | +|--------|--------| +| All PASS | Mission Control is ready for assignment. Proceed with cutover. | +| 1–2 FAIL | Investigate failures. Re-run after fixes. Do not proceed to cutover. | +| 3+ FAIL | Do not proceed. Block on critical failures (health, sync, issues, audit). | +| Any SKIP | Document justification. Verify skipped checks were handled by alternative means. | + +### Common failure patterns + +- **Health check fails (503):** Database connection issue. Check `DATABASE_URL`, PostgreSQL status, and Prisma binary compatibility. +- **Sync returns 0:** No repos configured. Set `GITHUB_REPOSITORIES` or add repos via `/api/automation/repos`. +- **Issues empty after sync:** GitHub token may lack permissions for the target repos. Verify `GITHUB_TOKEN` scopes. +- **Audit log missing entries:** Check Prisma schema for `AuditLog` model and confirm migrations are deployed (`prisma migrate deploy`). +- **BigInt errors in logs:** Prisma version mismatch or schema using `BigInt` without proper type handling. Check `prisma/schema.prisma` for `@db.BigInt` fields. + +--- + +## History + +- **2026-05-16** — Created as assignment-layer runtime smoke checklist (Issue #60). Documents all 14 checks covering health, sync, repos, issues, board UI, projects UI, agent runs, queue, claim/unclaim lifecycle, audit trail, log errors, and failure resilience.