Summary
gh-aw v0.74.4's runtime detection for ARC/DinD only adds
--docker-host-path-prefix /tmp/gh-aw to the AWF invocation when
DOCKER_HOST matches ^tcp:// (PR #31996, broadened from
^tcp://(localhost|127\.0\.0\.1) on 2026-05-13). On Actions Runner
Controller deployments where the runner pod mounts a sibling daemon
pod's docker socket as unix:///var/run/docker.sock, the regex never
fires, so --docker-host-path-prefix is never enabled. The agent
container then bind-mounts paths the daemon cannot see, and the run
fails.
This is the same root condition #30840 / #30838 / #28888 describe — the
runner and daemon filesystems are split — but in our shape DOCKER_HOST
is a unix socket because the docker socket is bind-mounted into the
runner pod from the daemon pod, not exposed over TCP.
Symptom
The agent harness fails immediately on prompt-file read:
[entrypoint] Executing command: /bin/bash -c '... --prompt-file /tmp/gh-aw/aw-prompts/prompt.txt'
[claude-harness] fatal: --prompt-file '/tmp/gh-aw/aw-prompts/prompt.txt'
is not readable: ENOENT: no such file or directory, stat '/tmp/gh-aw/aw-prompts/prompt.txt'
Activation phase wrote the prompt to /tmp/gh-aw/aw-prompts/prompt.txt
on the runner. AWF launched with:
sudo -E awf --config "${RUNNER_TEMP}/gh-aw/awf-config.json" \
--container-workdir "${GITHUB_WORKSPACE}" \
--mount "${RUNNER_TEMP}/gh-aw:${RUNNER_TEMP}/gh-aw:ro" \
--mount "${RUNNER_TEMP}/gh-aw:/host${RUNNER_TEMP}/gh-aw:ro" \
${GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS} \
...
GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS was empty because:
if [[ "${DOCKER_HOST:-}" =~ ^tcp:// ]]; then
GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS="--docker-host-path-prefix /tmp/gh-aw"
fi
didn't match — DOCKER_HOST=unix:///var/run/docker.sock. So AWF emitted
bind-mounts of /tmp/gh-aw and ${RUNNER_TEMP}/gh-aw that the daemon
resolves against its own (different) filesystem.
Probe results
We ran a diagnostic workflow (diag-runner-state.yml) on this runner
that demonstrates both halves of the failure mode:
DOCKER_HOST=unix:///var/run/docker.sock
srw-rw---- 1 root 2375 0 May 21 14:31 /var/run/docker.sock
probe likely TCP daemon endpoints
unreachable: tcp://localhost:2375
unreachable: tcp://localhost:2376
unreachable: tcp://127.0.0.1:2375
unreachable: tcp://127.0.0.1:2376
unreachable: tcp://dind:2375
unreachable: tcp://dind:2376
unreachable: tcp://docker-dind:2375
unreachable: tcp://dockerd:2375
cross-check daemon-side filesystem
-rw-r--r-- 1 runner runner 38 May 21 14:42 /tmp/runner-side-sentinel-900
--- as seen from a container with /tmp bind-mounted ---
ls: /tmp/runner-side-sentinel-900: No such file or directory
MISSING
Two confirmed facts:
- The daemon pod does not expose any standard docker TCP port that the
runner can reach. The only TCP services visible to the runner pod
are kube-apiserver and a buildkit daemon — not docker.
- A sentinel file written from the runner at
/tmp/runner-side-sentinel-X
is invisible inside a container with -v /tmp:/tmp. The daemon
resolves /tmp against its own pod filesystem.
So the existing detection logic — "TCP DOCKER_HOST implies split
filesystems" — is correct in spirit but incomplete. Unix-socket
DOCKER_HOST also implies split filesystems when the socket is
bind-mounted from a sibling pod, which is a common ARC topology.
Reproduction
ARC RunnerScaleSet configured with a sibling daemon pod and the
docker socket bind-mounted into the runner. No special workflow config
required — any AWF-backed agent workflow will fail on the prompt-file
read inside the agent container.
Suggested fix
A pure regex on DOCKER_HOST is insufficient because unix-socket
sibling-DinD looks the same as host-local docker from the env-var
perspective. The runtime probe needs another signal.
Two options that both work:
-
Setup-time inode probe. Before composing the AWF flags, write a
sentinel file to /tmp/gh-aw/.dind-probe-$$ on the runner, then run
docker run --rm -v /tmp/gh-aw:/x alpine ls /x/.dind-probe-$$. If
the sentinel is missing on the daemon side, the daemon filesystem is
split, and --docker-host-path-prefix should be engaged regardless
of DOCKER_HOST scheme. Cleanup the sentinel afterwards. This is
one extra docker run per agent run — cheap.
-
Stat-based check. stat -c '%d %i' /tmp/gh-aw on the runner vs.
the same call inside a container with -v /tmp/gh-aw:/x. Different
(device, inode) ⇒ split filesystem ⇒ engage prefix path. Slightly
subtler than option 1.
Option 1 is more direct and harder to misread.
A third less-elegant option: an explicit frontmatter override
(sandbox.compatibility: arc-dind per the proposal in #30840) so users
can force the path-prefix path on. We've been working around the issue
locally by tar-piping /tmp/gh-aw and ${RUNNER_TEMP}/gh-aw from the
runner into the daemon filesystem before AWF launches, but that's the
sort of hand-rolled workaround #30840 explicitly calls out as too
brittle for normal users.
Why not just expose TCP on the daemon pod?
Cluster admins controlling the runner image sometimes will, sometimes
won't. On the deployment we hit this on (trusted-environments-gh-aw
running on a private GHES instance), the daemon pod is configured
socket-only and we don't have permission to change it. The unix-socket
sibling-DinD shape is supportable in principle — gh-aw just can't
detect it yet.
Related
Environment
- gh-aw
v0.74.4, AWF v0.25.46
- Runner: ARC
trusted-environments-gh-aw on Kubernetes, Ubuntu 24.04,
Docker 29.5.1
- DOCKER_HOST:
unix:///var/run/docker.sock (bind-mounted from sibling
daemon pod)
Summary
gh-aw v0.74.4's runtime detection for ARC/DinD only adds
--docker-host-path-prefix /tmp/gh-awto the AWF invocation whenDOCKER_HOSTmatches^tcp://(PR #31996, broadened from^tcp://(localhost|127\.0\.0\.1)on 2026-05-13). On Actions RunnerController deployments where the runner pod mounts a sibling daemon
pod's docker socket as
unix:///var/run/docker.sock, the regex neverfires, so
--docker-host-path-prefixis never enabled. The agentcontainer then bind-mounts paths the daemon cannot see, and the run
fails.
This is the same root condition #30840 / #30838 / #28888 describe — the
runner and daemon filesystems are split — but in our shape
DOCKER_HOSTis a unix socket because the docker socket is bind-mounted into the
runner pod from the daemon pod, not exposed over TCP.
Symptom
The agent harness fails immediately on prompt-file read:
Activation phase wrote the prompt to
/tmp/gh-aw/aw-prompts/prompt.txton the runner. AWF launched with:
GH_AW_DOCKER_HOST_PATH_PREFIX_ARGSwas empty because:didn't match —
DOCKER_HOST=unix:///var/run/docker.sock. So AWF emittedbind-mounts of
/tmp/gh-awand${RUNNER_TEMP}/gh-awthat the daemonresolves against its own (different) filesystem.
Probe results
We ran a diagnostic workflow (
diag-runner-state.yml) on this runnerthat demonstrates both halves of the failure mode:
Two confirmed facts:
runner can reach. The only TCP services visible to the runner pod
are kube-apiserver and a buildkit daemon — not docker.
/tmp/runner-side-sentinel-Xis invisible inside a container with
-v /tmp:/tmp. The daemonresolves
/tmpagainst its own pod filesystem.So the existing detection logic — "TCP DOCKER_HOST implies split
filesystems" — is correct in spirit but incomplete. Unix-socket
DOCKER_HOST also implies split filesystems when the socket is
bind-mounted from a sibling pod, which is a common ARC topology.
Reproduction
ARC
RunnerScaleSetconfigured with a sibling daemon pod and thedocker socket bind-mounted into the runner. No special workflow config
required — any AWF-backed agent workflow will fail on the prompt-file
read inside the agent container.
Suggested fix
A pure regex on
DOCKER_HOSTis insufficient because unix-socketsibling-DinD looks the same as host-local docker from the env-var
perspective. The runtime probe needs another signal.
Two options that both work:
Setup-time inode probe. Before composing the AWF flags, write a
sentinel file to
/tmp/gh-aw/.dind-probe-$$on the runner, then rundocker run --rm -v /tmp/gh-aw:/x alpine ls /x/.dind-probe-$$. Ifthe sentinel is missing on the daemon side, the daemon filesystem is
split, and
--docker-host-path-prefixshould be engaged regardlessof DOCKER_HOST scheme. Cleanup the sentinel afterwards. This is
one extra
docker runper agent run — cheap.Stat-based check.
stat -c '%d %i' /tmp/gh-awon the runner vs.the same call inside a container with
-v /tmp/gh-aw:/x. Different(device, inode)⇒ split filesystem ⇒ engage prefix path. Slightlysubtler than option 1.
Option 1 is more direct and harder to misread.
A third less-elegant option: an explicit frontmatter override
(
sandbox.compatibility: arc-dindper the proposal in #30840) so userscan force the path-prefix path on. We've been working around the issue
locally by tar-piping
/tmp/gh-awand${RUNNER_TEMP}/gh-awfrom therunner into the daemon filesystem before AWF launches, but that's the
sort of hand-rolled workaround #30840 explicitly calls out as too
brittle for normal users.
Why not just expose TCP on the daemon pod?
Cluster admins controlling the runner image sometimes will, sometimes
won't. On the deployment we hit this on (
trusted-environments-gh-awrunning on a private GHES instance), the daemon pod is configured
socket-only and we don't have permission to change it. The unix-socket
sibling-DinD shape is supportable in principle — gh-aw just can't
detect it yet.
Related
general problem and lists similar workflow-side staging hacks
^tcp://,which fixes K8s-service-named TCP daemons but still misses
unix-socket sibling-DinD
docs/adr/31614-auto-detect-arc-dind-docker-host-path-prefix.mdEnvironment
v0.74.4, AWFv0.25.46trusted-environments-gh-awon Kubernetes, Ubuntu 24.04,Docker 29.5.1
unix:///var/run/docker.sock(bind-mounted from siblingdaemon pod)