[ARC-DinD] auto-detection should also engage --docker-host-path-prefix when DOCKER_HOST is a unix socket from a sibling daemon pod

## Summary

gh-aw v0.74.4's runtime detection for ARC/DinD only adds
`--docker-host-path-prefix /tmp/gh-aw` to the AWF invocation when
`DOCKER_HOST` matches `^tcp://` (PR #31996, broadened from
`^tcp://(localhost|127\.0\.0\.1)` on 2026-05-13). On Actions Runner
Controller deployments where the runner pod mounts a sibling daemon
pod's docker socket as `unix:///var/run/docker.sock`, the regex never
fires, so `--docker-host-path-prefix` is never enabled. The agent
container then bind-mounts paths the daemon cannot see, and the run
fails.

This is the same root condition #30840 / #30838 / #28888 describe — the
runner and daemon filesystems are split — but in our shape `DOCKER_HOST`
is a unix socket because the docker socket is bind-mounted into the
runner pod from the daemon pod, not exposed over TCP.

## Symptom

The agent harness fails immediately on prompt-file read:

```
[entrypoint] Executing command: /bin/bash -c '... --prompt-file /tmp/gh-aw/aw-prompts/prompt.txt'
[claude-harness] fatal: --prompt-file '/tmp/gh-aw/aw-prompts/prompt.txt'
  is not readable: ENOENT: no such file or directory, stat '/tmp/gh-aw/aw-prompts/prompt.txt'
```

Activation phase wrote the prompt to `/tmp/gh-aw/aw-prompts/prompt.txt`
on the runner. AWF launched with:

```
sudo -E awf --config "${RUNNER_TEMP}/gh-aw/awf-config.json" \
  --container-workdir "${GITHUB_WORKSPACE}" \
  --mount "${RUNNER_TEMP}/gh-aw:${RUNNER_TEMP}/gh-aw:ro" \
  --mount "${RUNNER_TEMP}/gh-aw:/host${RUNNER_TEMP}/gh-aw:ro" \
  ${GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS} \
  ...
```

`GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS` was empty because:

```bash
if [[ "${DOCKER_HOST:-}" =~ ^tcp:// ]]; then
  GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS="--docker-host-path-prefix /tmp/gh-aw"
fi
```

didn't match — `DOCKER_HOST=unix:///var/run/docker.sock`. So AWF emitted
bind-mounts of `/tmp/gh-aw` and `${RUNNER_TEMP}/gh-aw` that the daemon
resolves against its own (different) filesystem.

## Probe results

We ran a diagnostic workflow (`diag-runner-state.yml`) on this runner
that demonstrates both halves of the failure mode:

```
DOCKER_HOST=unix:///var/run/docker.sock
srw-rw---- 1 root 2375 0 May 21 14:31 /var/run/docker.sock

probe likely TCP daemon endpoints
  unreachable: tcp://localhost:2375
  unreachable: tcp://localhost:2376
  unreachable: tcp://127.0.0.1:2375
  unreachable: tcp://127.0.0.1:2376
  unreachable: tcp://dind:2375
  unreachable: tcp://dind:2376
  unreachable: tcp://docker-dind:2375
  unreachable: tcp://dockerd:2375

cross-check daemon-side filesystem
  -rw-r--r-- 1 runner runner 38 May 21 14:42 /tmp/runner-side-sentinel-900
  --- as seen from a container with /tmp bind-mounted ---
  ls: /tmp/runner-side-sentinel-900: No such file or directory
  MISSING
```

Two confirmed facts:

1. The daemon pod does not expose any standard docker TCP port that the
   runner can reach. The only TCP services visible to the runner pod
   are kube-apiserver and a buildkit daemon — not docker.
2. A sentinel file written from the runner at `/tmp/runner-side-sentinel-X`
   is invisible inside a container with `-v /tmp:/tmp`. The daemon
   resolves `/tmp` against its own pod filesystem.

So the existing detection logic — "TCP DOCKER_HOST implies split
filesystems" — is correct in spirit but incomplete. Unix-socket
DOCKER_HOST also implies split filesystems when the socket is
bind-mounted from a sibling pod, which is a common ARC topology.

## Reproduction

ARC `RunnerScaleSet` configured with a sibling daemon pod and the
docker socket bind-mounted into the runner. No special workflow config
required — any AWF-backed agent workflow will fail on the prompt-file
read inside the agent container.

## Suggested fix

A pure regex on `DOCKER_HOST` is insufficient because unix-socket
sibling-DinD looks the same as host-local docker from the env-var
perspective. The runtime probe needs another signal.

Two options that both work:

1. **Setup-time inode probe.** Before composing the AWF flags, write a
   sentinel file to `/tmp/gh-aw/.dind-probe-$$` on the runner, then run
   `docker run --rm -v /tmp/gh-aw:/x alpine ls /x/.dind-probe-$$`. If
   the sentinel is missing on the daemon side, the daemon filesystem is
   split, and `--docker-host-path-prefix` should be engaged regardless
   of DOCKER_HOST scheme. Cleanup the sentinel afterwards. This is
   one extra `docker run` per agent run — cheap.

2. **Stat-based check.** `stat -c '%d %i' /tmp/gh-aw` on the runner vs.
   the same call inside a container with `-v /tmp/gh-aw:/x`. Different
   `(device, inode)` ⇒ split filesystem ⇒ engage prefix path. Slightly
   subtler than option 1.

Option 1 is more direct and harder to misread.

A third less-elegant option: an explicit frontmatter override
(`sandbox.compatibility: arc-dind` per the proposal in #30840) so users
can force the path-prefix path on. We've been working around the issue
locally by tar-piping `/tmp/gh-aw` and `${RUNNER_TEMP}/gh-aw` from the
runner into the daemon filesystem before AWF launches, but that's the
sort of hand-rolled workaround #30840 explicitly calls out as too
brittle for normal users.

## Why not just expose TCP on the daemon pod?

Cluster admins controlling the runner image sometimes will, sometimes
won't. On the deployment we hit this on (`trusted-environments-gh-aw`
running on a private GHES instance), the daemon pod is configured
socket-only and we don't have permission to change it. The unix-socket
sibling-DinD shape is supportable in principle — gh-aw just can't
detect it yet.

## Related

- #30840 — first-class ARC support, closed-completed; describes the
  general problem and lists similar workflow-side staging hacks
- #30838 — AWF native ARC/DinD support
- #28888 — MCP gateway socket discovery on ARC
- PR #31996 (merged 2026-05-13) — broadened the regex to `^tcp://`,
  which fixes K8s-service-named TCP daemons but still misses
  unix-socket sibling-DinD
- ADR `docs/adr/31614-auto-detect-arc-dind-docker-host-path-prefix.md`

## Environment

- gh-aw `v0.74.4`, AWF `v0.25.46`
- Runner: ARC `trusted-environments-gh-aw` on Kubernetes, Ubuntu 24.04,
  Docker 29.5.1
- DOCKER_HOST: `unix:///var/run/docker.sock` (bind-mounted from sibling
  daemon pod)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ARC-DinD] auto-detection should also engage --docker-host-path-prefix when DOCKER_HOST is a unix socket from a sibling daemon pod #33777

Summary

Symptom

Probe results

Reproduction

Suggested fix

Why not just expose TCP on the daemon pod?

Related

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[ARC-DinD] auto-detection should also engage --docker-host-path-prefix when DOCKER_HOST is a unix socket from a sibling daemon pod #33777

Description

Summary

Symptom

Probe results

Reproduction

Suggested fix

Why not just expose TCP on the daemon pod?

Related

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions