Skip to content

fix: handle EACCES during chroot-home cleanup in rootless Docker#5653

Merged
lpcox merged 5 commits into
mainfrom
fix-rootless-chroot-home-cleanup
Jun 28, 2026
Merged

fix: handle EACCES during chroot-home cleanup in rootless Docker#5653
lpcox merged 5 commits into
mainfrom
fix-rootless-chroot-home-cleanup

Conversation

@lpcox

@lpcox lpcox commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Problem

In rootless Docker mode (e.g. rootless network isolation on GitHub Actions runners), files created inside the agent container (like .aws/config) are owned by remapped UIDs. When AWF tries to clean up the chroot-home directory after execution, fs.rmSync fails with EACCES because the host process doesn't have permission to delete these remapped-UID files.

errno: -13,
code: 'EACCES',
syscall: 'unlink',
path: '/tmp/awf-1782658941006-chroot-home/.aws/config'

Fix

Apply the same rootless permission repair pattern already used for artifact directories (fixArtifactPermissionsForRootless):

  1. Attempt normal rmSync first (fast path for non-rootless)
  2. On EACCES, spin up a short-lived Docker container with CHOWN/DAC_OVERRIDE/FOWNER capabilities to chown the files back to the host user
  3. Retry rmSync after permission repair

The cleanup() function now forwards dockerHostPathPrefix, imageRegistry, imageTag, and agentImage to removeWorkDirectories so it can perform the rootless repair.

Changes

  • src/artifact-preservation.ts: removeWorkDirectories now catches EACCES and retries after rootless permission repair
  • src/container-cleanup.ts: Pass rootless context options through to removeWorkDirectories

Fixes github/gh-aw#42101

lpcox and others added 3 commits June 28, 2026 10:07
The API proxy was returning HTTP 403 when the max-runs limit was hit,
which caused the Copilot CLI harness to misinterpret it as an
authentication failure rather than a turn budget exhaustion. Change the
status code to 429 (Too Many Requests) which more accurately reflects
the condition and allows clients to distinguish auth failures from
resource limits.

Also bumps the contribution-check workflow max-turns from 3 to 4 to
give the agent enough headroom to complete its task.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update test assertion to match the new max-turns value in the
contribution-check workflow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In rootless Docker mode, files created inside the agent container
are owned by remapped UIDs that the host process cannot delete.
This caused cleanup to fail with EACCES when removing the
chroot-home directory.

Apply the same rootless permission repair pattern already used for
artifact directories: on EACCES, spin up a short-lived container
with CHOWN/DAC_OVERRIDE/FOWNER capabilities to fix ownership,
then retry removal.

Fixes github/gh-aw#42101

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 28, 2026 18:55
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Coverage Regression Detected

This PR decreases test coverage. Please add tests to maintain coverage levels.

Overall Coverage

Metric Base PR Delta
Lines 98.20% 98.11% 📉 -0.09%
Statements 98.13% 98.04% 📉 -0.09%
Functions 99.54% 99.54% ➡️ +0.00%
Branches 94.19% 93.98% 📉 -0.21%
📁 Per-file Coverage Changes (2 files)
File Lines (Before → After) Statements (Before → After)
src/artifact-preservation.ts 100.0% → 92.4% (-7.61%) 100.0% → 92.4% (-7.61%)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a cleanup failure in rootless Docker scenarios by adding an EACCES-recovery path when deleting the *-chroot-home directory, reusing the existing “rootless permission repair” mechanism that chowns files back to the host UID/GID before retrying deletion.

Changes:

  • Forward rootless/docker image context from cleanup() into removeWorkDirectories() so permission repair can run during cleanup.
  • Add EACCES handling for *-chroot-home deletion: attempt rmSync, on EACCES run fixArtifactPermissionsForRootless, then retry rmSync.
  • Update contribution-check workflow turn limits and adjust API-proxy max-runs guard semantics/tests (403 → 429).
Show a summary per file
File Description
src/container-cleanup.ts Passes docker/rootless context options through to removeWorkDirectories() during cleanup.
src/artifact-preservation.ts Adds EACCES recovery for chroot-home deletion using rootless permission repair + retry.
scripts/ci/contribution-check-workflow.test.ts Updates workflow assertion for max-turns (3 → 4).
containers/api-proxy/server.websocket.test.js Updates WebSocket max-runs guard expectation to 429.
containers/api-proxy/server.token-guards.test.js Updates HTTP max-runs guard expectation to 429 and asserts error message.
containers/api-proxy/guards/common-guard-checks.js Changes max-runs guard status code to 429 and updates rationale comment.
.github/workflows/contribution-check.md Increases workflow max-turns to 4.
.github/workflows/contribution-check.lock.yml Regenerates locked workflow output consistent with updated workflow config (including maxRuns/max-turns).

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 8/8 changed files
  • Comments generated: 4
  • Review effort level: Low

Comment thread src/artifact-preservation.ts Outdated
Comment on lines +223 to +230
type RemoveWorkDirectoriesOptions = {
dockerHostPathPrefix?: string;
imageRegistry?: string;
imageTag?: string;
agentImage?: string;
};

export function removeWorkDirectories(workDir: string, options: RemoveWorkDirectoriesOptions = {}): void {
Comment on lines +238 to +242
// In rootless Docker, files created inside the container may be owned by
// remapped UIDs that the host process cannot delete. Fix permissions via
// a privileged container, then retry removal.
if (error && typeof error === 'object' && 'code' in error && error.code === 'EACCES') {
logger.debug('Chroot home removal failed with EACCES; attempting rootless permission repair');
Comment on lines +83 to +86
// Terminal hard cap: 429 signals the agent has exhausted its allowed
// turns. Unlike transient rate-limits, the Retry-After is absent so
// well-behaved clients won't retry indefinitely.
statusCode: 429,
Comment thread .github/workflows/contribution-check.md Outdated
pull-requests: read
issues: read
max-turns: 3
max-turns: 4
@lpcox

lpcox commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator Author

@copilot run pr-finisher skill

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

✅ Copilot review passed with no inline comments.

@lpcox Add the ready-for-aw label to this PR to trigger agentic CI smoke tests.

Copilot finished work on behalf of lpcox June 28, 2026 19:11
@lpcox

lpcox commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator Author

@copilot address review feedback

Copilot AI commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

@copilot address review feedback

Addressed in 6d8ce9d.

I reverted the unrelated workflow/API-proxy changes so this PR is back to the rootless cleanup fix, and kept the focused removeWorkDirectories EACCES regression coverage in place. Local build, lint, and test all passed.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

Copy link
Copy Markdown
Contributor

🚀 Security Guard has started processing this pull request

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude passed

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

Contribution guidelines check complete for PR #5653: the PR includes tests for the cleanup change, has a clear description with a related issue reference, updates source files in the appropriate location, and does not appear to need documentation changes.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini reports failed. Facets need polishing...

Smoke test failed: MCP and Connectivity tests ❌. File writing and Bash tools ✅. PR titles partially unavailable.

@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Coverage Regression Detected

This PR decreases test coverage. Please add tests to maintain coverage levels.

Overall Coverage

Metric Base PR Delta
Lines 98.20% 98.20% ➡️ +0.00%
Statements 98.13% 98.13% ➡️ +0.00%
Functions 99.54% 99.54% ➡️ +0.00%
Branches 94.19% 94.14% 📉 -0.05%
📁 Per-file Coverage Changes (2 files)
File Lines (Before → After) Statements (Before → After)
src/artifact-preservation.ts 100.0% → 97.8% (-2.18%) 100.0% → 97.8% (-2.18%)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Claude Engine Validation

  • API check: ✅ PASS
  • gh CLI check: ✅ PASS
  • File check: ✅ PASS

Overall result: PASS

Generated by Smoke Claude for #5653 · 58.4 AIC · ⊞ 3.3K ·

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test Results

Test Status
GitHub MCP connectivity
GitHub.com HTTP
File write/read

PR: fix: handle EACCES during chroot-home cleanup in rootless Docker
Author: @lpcox

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot PAT Auth — PARTIAL PASS

Test Result
GitHub MCP ✅ Connected (fetched PR #5651)
GitHub.com HTTP ✅ HTTP 200
File Write/Read ❓ Template vars not expanded (pre-step outputs missing)

Overall: PARTIAL — pre-step template variables (${{ steps.smoke-data.outputs.* }}) were not substituted, so file test could not be verified.

Auth mode: PAT (COPILOT_GITHUB_TOKEN) | Author: @lpcox

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox

  • GitHub MCP API: ✅
  • GitHub.com connectivity: ✅
  • File I/O test: ✅
  • BYOK inference: ✅
    Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra
    Overall: PASS

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) ✅ PASS

✅ GitHub MCP connectivity verified
✅ GitHub.com HTTP 200
✅ File I/O working
✅ BYOK inference path active (api-proxy → api.githubcopilot.com)

Running in direct BYOK mode via COPILOT_PROVIDER_API_KEY.

@lpcox

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

🔭 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Notes
1. Module Loading ✅ Pass otel.js loads cleanly; exports: startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled
2. Test Suite ✅ Pass 59/59 tests passed, 0 failed (2 suites: otel.test.js, otel-fanout.test.js)
3. Env Var Forwarding ✅ Pass api-proxy-env-config.ts forwards all OTEL vars: GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME
4. Token Tracker Integration ✅ Pass onUsage callback present in token-tracker-http.js as the OTEL hook point
5. OTEL Diagnostics ✅ Pass (N/A) No live api-proxy in this env (Docker-in-Docker not supported); file exporter fallback path (/var/log/api-proxy/otel.jsonl) confirmed in code

All scenarios pass. The OTEL tracing integration is complete and fully tested.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

fix: handle EACCES during chroot-home cleanup in rootless Docker\n\n✅ GitHub reads\n✅ Playwright title check\n✅ smoke-test file\n✅ npm ci && npm run build\n\nPASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.17.0 v22.23.0
Go go1.22.12 go1.22.12

Status: FAILED — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Services Connectivity

Check Result
Redis PING ❌ no response
PostgreSQL pg_isready ❌ no response
PostgreSQL SELECT 1 ❌ no response

Overall: FAILhost.docker.internal is unreachable from this runner environment. All three service checks timed out with no response.

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox
✅ MCP Testing: fix: handle EACCES during chroot-home cleanup; fix: correctly recover runner tool on PATH (after sudo w/ secure_path)
✅ GitHub.com Connectivity
✅ File Write/Read Test
✅ BYOK Inference Test
Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)
PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx all passed ✅ PASS
Node.js execa all passed ✅ PASS
Node.js p-limit all passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for #5653 · 49.3 AIC · ⊞ 7.8K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw] Daily Issues Report Generator failed

3 participants