-
Notifications
You must be signed in to change notification settings - Fork 0
feat(factory): own state-name defaults + ship canary scheduling tooling #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2aa388f
c0faf45
4e88540
cd6a16c
901c295
f76a311
6690132
687587e
7db94ce
a9be2b6
afdde78
211d1af
211e45e
2792096
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| <?xml version="1.0" encoding="UTF-8"?> | ||
| <!-- | ||
| launchd template for the factory sync-fidelity canary (see factory-canary.sh). | ||
|
|
||
| Install (macOS): | ||
| 1. Replace __FACTORY_WORKDIR__ with your factory deployment dir (the one with | ||
| factory.config.json), and __FACTORY_BIN__ with the path to factory.mjs | ||
| (e.g. node_modules/@agent-relay/factory/bin/factory.mjs). Set the issue key | ||
| and, optionally, FACTORY_CANARY_SLACK_WEBHOOK. | ||
| 2. cp scripts/com.agentrelay.factory-canary.plist.example \ | ||
| ~/Library/LaunchAgents/com.agentrelay.factory-canary.plist | ||
| 3. launchctl load ~/Library/LaunchAgents/com.agentrelay.factory-canary.plist | ||
| 4. launchctl start com.agentrelay.factory-canary # run once now to verify | ||
|
|
||
| Runs every 6h (StartInterval 21600). Logs to <workdir>/.factory-canary.log. | ||
| Run from the deployment dir so the canary reuses the running relay broker. | ||
| --> | ||
| <plist version="1.0"> | ||
| <dict> | ||
| <key>Label</key> | ||
| <string>com.agentrelay.factory-canary</string> | ||
|
|
||
| <key>ProgramArguments</key> | ||
| <array> | ||
| <string>/bin/bash</string> | ||
| <string>__FACTORY_WORKDIR__/scripts/factory-canary.sh</string> | ||
| </array> | ||
|
|
||
| <key>WorkingDirectory</key> | ||
| <string>__FACTORY_WORKDIR__</string> | ||
|
|
||
| <key>EnvironmentVariables</key> | ||
| <dict> | ||
| <key>FACTORY_CANARY_ISSUE</key> | ||
| <string>AR-305</string> | ||
| <key>FACTORY_WORKDIR</key> | ||
| <string>__FACTORY_WORKDIR__</string> | ||
| <key>FACTORY_BIN</key> | ||
| <string>__FACTORY_BIN__</string> | ||
| <key>FACTORY_CANARY_SLACK_WEBHOOK</key> | ||
| <string></string> | ||
| </dict> | ||
|
|
||
| <key>StartInterval</key> | ||
| <integer>21600</integer> | ||
| <key>RunAtLoad</key> | ||
| <true/> | ||
| <key>StandardOutPath</key> | ||
| <string>__FACTORY_WORKDIR__/.factory-canary.log</string> | ||
| <key>StandardErrorPath</key> | ||
| <string>__FACTORY_WORKDIR__/.factory-canary.log</string> | ||
| </dict> | ||
| </plist> |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| #!/usr/bin/env bash | ||
| # | ||
| # factory-canary.sh — scheduled sync-fidelity regression detector. | ||
| # | ||
| # Runs `factory canary <issue>` against the LIVE relayfile mount and asserts a | ||
| # known "Ready for Agent" issue is still classified dispatch-ready by the real | ||
| # triage path. If it ever flips to "skipped" (e.g. the Linear sync regresses to | ||
| # sparse records with no state.id), this exits non-zero and alerts — catching the | ||
| # regression before it silently blocks every factory dispatch. | ||
| # | ||
| # Run it on a schedule (cron/launchd) from your factory deployment directory — | ||
| # the one holding factory.config.json, where the relayfile mount + relay broker | ||
| # already live (so the canary reuses the running broker rather than spawning one). | ||
| # See scripts/com.agentrelay.factory-canary.plist.example for a launchd template. | ||
| # | ||
| # Config (env vars): | ||
| # FACTORY_CANARY_ISSUE Linear issue key to check (default: the first arg) | ||
| # FACTORY_WORKDIR deployment dir with factory.config.json (default: cwd) | ||
| # FACTORY_CONFIG config path, relative to FACTORY_WORKDIR (default: factory.config.json) | ||
| # FACTORY_BIN path to factory.mjs (default: this repo's bin/factory.mjs) | ||
| # FACTORY_BACKEND --backend value (default: internal) | ||
| # FACTORY_CANARY_TIMEOUT seconds before the canary is considered hung (default: 180) | ||
| # FACTORY_CANARY_SLACK_WEBHOOK optional Slack incoming-webhook URL for failure alerts | ||
| # | ||
| # Exit codes: 0 = dispatch-ready (healthy); 1 = NOT ready / error / hung. | ||
|
|
||
| set -uo pipefail | ||
|
|
||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| ISSUE="${FACTORY_CANARY_ISSUE:-${1:-}}" | ||
| WORKDIR="${FACTORY_WORKDIR:-$PWD}" | ||
| CONFIG="${FACTORY_CONFIG:-factory.config.json}" | ||
| BIN="${FACTORY_BIN:-$SCRIPT_DIR/../bin/factory.mjs}" | ||
| BACKEND="${FACTORY_BACKEND:-internal}" | ||
| TIMEOUT="${FACTORY_CANARY_TIMEOUT:-180}" | ||
| TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" | ||
|
|
||
| if [[ -z "$ISSUE" ]]; then | ||
| echo "[$TS] factory-canary: no issue key (set FACTORY_CANARY_ISSUE or pass an arg)" >&2 | ||
| exit 1 | ||
| fi | ||
| if [[ ! -f "$BIN" ]]; then | ||
| echo "[$TS] factory-canary: factory bin not found at $BIN" >&2 | ||
| exit 1 | ||
| fi | ||
| cd "$WORKDIR" || { echo "[$TS] factory-canary: cannot cd to $WORKDIR" >&2; exit 1; } | ||
|
|
||
| # The canary runs the real dry-run triage path (no agents spawned) and prints a | ||
| # JSON verdict {ok,issue,status,reason}; exit code mirrors ok. A hung run | ||
| # (broker/mount wedge) is bounded by FACTORY_CANARY_TIMEOUT. | ||
| RUN=(node "$BIN" factory canary "$ISSUE" --config "$CONFIG" --backend "$BACKEND") | ||
| # A hung run (broker/mount wedge) MUST be bounded — an unbounded canary on a | ||
| # scheduler (launchd/cron) can wedge the slot forever and suppress later alerts. | ||
| # macOS has no `timeout` by default; coreutils ships it as `gtimeout`. If neither | ||
| # is present, fail closed rather than run without a deadline. | ||
| TIMEOUT_BIN="" | ||
| if command -v timeout >/dev/null 2>&1; then | ||
| TIMEOUT_BIN="timeout" | ||
| elif command -v gtimeout >/dev/null 2>&1; then | ||
| TIMEOUT_BIN="gtimeout" | ||
| fi | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
| if [[ -z "$TIMEOUT_BIN" ]]; then | ||
| echo "[$TS] factory-canary: no timeout utility found (install coreutils for 'timeout'/'gtimeout'); refusing to run unbounded" >&2 | ||
| exit 1 | ||
| fi | ||
| OUT="$("$TIMEOUT_BIN" "$TIMEOUT" "${RUN[@]}" 2>/dev/null)" | ||
| CODE=$? | ||
| if [[ $CODE -eq 124 ]]; then | ||
| echo "[$TS] factory-canary: TIMED OUT after ${TIMEOUT}s (broker/mount may be wedged)" >&2 | ||
| fi | ||
|
|
||
| # The CLI prints a pretty-printed (multi-line) JSON verdict, so parse the whole | ||
| # output — not just the last line (which is only the closing `}`). | ||
| echo "[$TS] factory-canary $ISSUE -> exit $CODE" | ||
| [[ $CODE -eq 0 ]] && exit 0 | ||
|
|
||
| REASON="$(printf '%s' "$OUT" | node -e 'let s="";process.stdin.on("data",d=>s+=d).on("end",()=>{try{const v=JSON.parse(s);console.log(`${v.status||"error"}: ${v.reason||"unknown"}`)}catch{console.log("unparseable verdict")}})' 2>/dev/null)" | ||
| MSG=":rotating_light: factory canary FAILED for ${ISSUE} — ${REASON}. Sync fidelity may have regressed (issue no longer dispatch-ready)." | ||
| echo "[$TS] $MSG" >&2 | ||
|
|
||
| if [[ -n "${FACTORY_CANARY_SLACK_WEBHOOK:-}" ]]; then | ||
| curl -sS -m 15 -X POST -H 'Content-type: application/json' \ | ||
| --data "$(node -e 'process.stdout.write(JSON.stringify({text:process.argv[1]}))' "$MSG")" \ | ||
| "$FACTORY_CANARY_SLACK_WEBHOOK" >/dev/null 2>&1 \ | ||
| && echo "[$TS] factory-canary: posted Slack alert" >&2 \ | ||
| || echo "[$TS] factory-canary: Slack alert post failed" >&2 | ||
| fi | ||
|
|
||
| exit 1 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -98,10 +98,22 @@ const babysitterSchema = z.object({ | |
| enabled: z.boolean().default(false), | ||
| }).default({}) | ||
|
|
||
| // The factory owns its workflow-state NAME conventions; consumers (e.g. pear) | ||
| // don't hand-configure them. These names let the factory resolve a role from a | ||
| // synced record that carries state.name but no state.id (sparse-sync fallback). | ||
| // A workspace that names states differently can override via config. | ||
| const DEFAULT_LINEAR_STATE_NAMES = { | ||
| readyForAgent: 'Ready for Agent', | ||
| agentImplementing: 'Agent Implementing', | ||
| done: 'Done', | ||
| inPlanning: 'In Planning', | ||
| humanReview: 'In Human Review', | ||
| } | ||
|
|
||
| const linearSchema = z.object({ | ||
| states: linearRoleNamesSchema, | ||
| states: linearRoleNamesSchema.default(DEFAULT_LINEAR_STATE_NAMES), | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Defaulting Useful? React with 👍 / 👎. |
||
| statesByTeam: z.record(z.string(), linearRoleNamesSchema).default({}), | ||
| }).default({}) | ||
| }).default({ states: DEFAULT_LINEAR_STATE_NAMES, statesByTeam: {} }) | ||
|
|
||
| const stateIdsSchema = z.object({ | ||
| readyForAgent: z.string().optional(), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new operational entrypoint is advertised in the README, but
package.jsonstill has afilesallowlist containing onlydist,bin/factory.mjs,package.json, andREADME.md, sonpm publishwill omit bothscripts/factory-canary.shand the launchd plist. For users deploying@agent-relay/factoryfrom the package, the documented__FACTORY_WORKDIR__/scripts/factory-canary.shpath will not exist, so the scheduled canary cannot be installed unless these scripts are added to the package or documented as repo-only.Useful? React with 👍 / 👎.