Skip to content

Commit cafd0b0

Browse files
committed
fix: P4 exit code masking, SIGKILL escalation, error annotations
1 parent b882daa commit cafd0b0

File tree

2 files changed

+24
-4
lines changed

2 files changed

+24
-4
lines changed

docs/workflow-issues.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,12 @@ Now executing **Assignment 3: create-project-structure**. This requires Python a
6363

6464
## P4: /orchestrate-project-setup timeout and completion issues
6565

66-
**Status: PARTIALLY FIXED**`--thinking` + `/proc/io` watchdog fixes prevent false-positive kills during subagent work; genuine subagent stalls still cause 15m idle kill
66+
**Status: FIXED** (commit TBD in `ai-new-workflow-app-template`) — three infrastructure bugs fixed:
67+
1. **Exit code masking**: Idle-killed runs exited 0, making GitHub Actions report "succeeded" despite incomplete work. Now exits 1.
68+
2. **`::warning::``::error::`**: Idle kills and hard-ceiling kills are failures — annotated as `::error::` so they surface in the workflow summary.
69+
3. **SIGTERM→SIGKILL escalation**: After sending SIGTERM, waits 10s then sends SIGKILL if the process hasn't exited, preventing zombie/hung processes.
70+
71+
Combined with P5's watchdog race-condition fix, the idle kill is now reliable: it won't false-positive during active subagent work (P5), it properly reports failure when it does fire (P4), and it cleans up stuck processes.
6772

6873
### Delta86 analysis (run 23332933790 — succeeded in 26m 14s)
6974

run_opencode_prompt.sh

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -310,17 +310,29 @@ while kill -0 "$OPENCODE_PID" 2>/dev/null; do
310310

311311
if [[ $elapsed -ge $HARD_CEILING_SECS ]]; then
312312
echo ""
313-
echo "::warning::opencode hit ${HARD_CEILING_SECS}s hard ceiling; terminating"
313+
echo "::error::opencode hit ${HARD_CEILING_SECS}s hard ceiling; terminating"
314314
kill "$OPENCODE_PID" 2>/dev/null
315+
# Escalate to SIGKILL if SIGTERM doesn't work within 10s
316+
sleep 10
317+
if kill -0 "$OPENCODE_PID" 2>/dev/null; then
318+
echo "::warning::opencode did not exit after SIGTERM; sending SIGKILL"
319+
kill -9 "$OPENCODE_PID" 2>/dev/null
320+
fi
315321
IDLE_KILLED=1
316322
break
317323
fi
318324

319325
# Idle detection: only trigger when BOTH client output and server are stale
320326
if [[ $idle -ge $IDLE_TIMEOUT_SECS ]]; then
321327
echo ""
322-
echo "::warning::opencode idle for $(( idle / 60 ))m (no output from client or server); terminating"
328+
echo "::error::opencode idle for $(( idle / 60 ))m (no output from client or server); terminating"
323329
kill "$OPENCODE_PID" 2>/dev/null
330+
# Escalate to SIGKILL if SIGTERM doesn't work within 10s
331+
sleep 10
332+
if kill -0 "$OPENCODE_PID" 2>/dev/null; then
333+
echo "::warning::opencode did not exit after SIGTERM; sending SIGKILL"
334+
kill -9 "$OPENCODE_PID" 2>/dev/null
335+
fi
324336
IDLE_KILLED=1
325337
break
326338
fi
@@ -372,8 +384,11 @@ rm -f "$OUTPUT_LOG"
372384

373385
set -e
374386

387+
# Exit non-zero on idle kill so the workflow properly reports failure.
388+
# Previously this was `exit 0` which masked SIGTERM (143) as success,
389+
# causing incomplete runs to appear as "succeeded" in GitHub Actions.
375390
if [[ $IDLE_KILLED -eq 1 ]]; then
376-
exit 0
391+
exit 1
377392
fi
378393

379394
exit ${OPENCODE_EXIT}

0 commit comments

Comments
 (0)