Summary
A self-hosted runner canary failed before the agent started because the MCP gateway container could not bind the fixed host port 8080:
failed to listen on 0.0.0.0:8080: listen tcp 0.0.0.0:8080: bind: address already in use
This looks like stale gh-aw container/runtime state remaining on a persistent self-hosted runner. The generated workflow should clean up gh-aw-owned containers/processes/ports when needed, or otherwise avoid fixed-port collisions, so subsequent jobs on the same runner do not fail during gateway startup.
Run details
- Workflow type: self-hosted Ubuntu 24.04 x64 host runner canary
- Run ID:
27450109858
- Job ID:
81143711103
- Event:
push
- Run attempt:
2
gh aw CLI: v0.79.6
- AWF/firewall:
v0.27.2
- MCP gateway image:
ghcr.io/github/gh-aw-mcpg:v0.3.25
- GitHub MCP server image:
ghcr.io/github/github-mcp-server:v1.1.2
What happened
I ran the debugger flow from debug.md:
gh aw audit 27450109858 --json
The audit reported:
- run status:
completed
- conclusion:
failure
- failing job:
agent
detection and safe_outputs skipped
- turns/tool usage:
0, because the agent never got past gateway startup
The generated workflow sets a fixed gateway port:
export MCP_GATEWAY_PORT="8080"
and starts the gateway with Docker host networking:
docker run -i --rm --network host ... ghcr.io/github/gh-aw-mcpg:v0.3.25
The gateway starts initialization and then exits:
[info] Gateway started with PID: 36949
[info] Waiting for gateway to initialize...
[error] ERROR: Gateway process (PID: 36949) exited during initialization
[INFO] Gateway will listen on 0.0.0.0:8080
[INFO] Command: ./awmg --routed --listen 0.0.0.0:8080 --config-stdin --log-dir /tmp/gh-aw/mcp-logs/
failed to listen on 0.0.0.0:8080: listen tcp 0.0.0.0:8080: bind: address already in use
Expected behavior
On persistent self-hosted runners, gh-aw should be resilient to leftover state from prior runs. Before starting containers/listening on fixed host ports, it should do one or more of the following:
- remove/stop stale gh-aw-owned containers that may still be running from previous jobs;
- clean up gh-aw-owned listeners/processes that are still bound to the gateway port;
- label/name containers with enough run metadata to safely identify stale gh-aw resources;
- allocate non-conflicting per-run ports instead of always using
8080; or
- fail with a diagnostic that identifies the container/process holding the port and suggests the cleanup command.
Why this matters
Hosted runners are ephemeral, but self-hosted runners persist across jobs. If gh-aw leaves containers or host-port listeners behind, later runs can fail before any agent turn, producing skipped detection/safe-output jobs and making the canary look like an agent failure when the issue is runner hygiene/runtime cleanup.
Summary
A self-hosted runner canary failed before the agent started because the MCP gateway container could not bind the fixed host port
8080:This looks like stale gh-aw container/runtime state remaining on a persistent self-hosted runner. The generated workflow should clean up gh-aw-owned containers/processes/ports when needed, or otherwise avoid fixed-port collisions, so subsequent jobs on the same runner do not fail during gateway startup.
Run details
2745010985881143711103push2gh awCLI:v0.79.6v0.27.2ghcr.io/github/gh-aw-mcpg:v0.3.25ghcr.io/github/github-mcp-server:v1.1.2What happened
I ran the debugger flow from
debug.md:The audit reported:
completedfailureagentdetectionandsafe_outputsskipped0, because the agent never got past gateway startupThe generated workflow sets a fixed gateway port:
and starts the gateway with Docker host networking:
The gateway starts initialization and then exits:
Expected behavior
On persistent self-hosted runners, gh-aw should be resilient to leftover state from prior runs. Before starting containers/listening on fixed host ports, it should do one or more of the following:
8080; orWhy this matters
Hosted runners are ephemeral, but self-hosted runners persist across jobs. If gh-aw leaves containers or host-port listeners behind, later runs can fail before any agent turn, producing skipped detection/safe-output jobs and making the canary look like an agent failure when the issue is runner hygiene/runtime cleanup.