Skip to content

fix: add concurrency groups to prevent engine rate limiting#5662

Merged
lpcox merged 5 commits into
mainfrom
fix-rootless-chroot-home-cleanup
Jun 28, 2026
Merged

fix: add concurrency groups to prevent engine rate limiting#5662
lpcox merged 5 commits into
mainfrom
fix-rootless-chroot-home-cleanup

Conversation

@lpcox

@lpcox lpcox commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds concurrency groups to security-guard and test-coverage-reporter workflows to prevent multiple simultaneous runs from competing for copilot engine API quota.

Problem

All three workflow failures (#5650, #5652, #5654) were caused by transient copilot engine failures:

These occurred because multiple workflows triggered concurrently on the same PR/branch, exhausting the provider's rate limits.

Fix

Added concurrency configuration to the two workflows that were missing it:

  • security-guard: group: "security-guard-${{ github.event.pull_request.number || github.ref }}" with cancel-in-progress: true
  • test-coverage-reporter: group: "test-coverage-reporter-${{ github.ref }}" with cancel-in-progress: true

The contribution-check workflow already had a concurrency group configured.

This ensures that redundant runs are cancelled, reducing concurrent load on the copilot engine API and preventing rate limit exhaustion.

Fixes #5650, #5652, #5654

lpcox and others added 4 commits June 28, 2026 10:07
The API proxy was returning HTTP 403 when the max-runs limit was hit,
which caused the Copilot CLI harness to misinterpret it as an
authentication failure rather than a turn budget exhaustion. Change the
status code to 429 (Too Many Requests) which more accurately reflects
the condition and allows clients to distinguish auth failures from
resource limits.

Also bumps the contribution-check workflow max-turns from 3 to 4 to
give the agent enough headroom to complete its task.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update test assertion to match the new max-turns value in the
contribution-check workflow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In rootless Docker mode, files created inside the agent container
are owned by remapped UIDs that the host process cannot delete.
This caused cleanup to fail with EACCES when removing the
chroot-home directory.

Apply the same rootless permission repair pattern already used for
artifact directories: on EACCES, spin up a short-lived container
with CHOWN/DAC_OVERRIDE/FOWNER capabilities to fix ownership,
then retry removal.

Fixes github/gh-aw#42101

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add concurrency groups to security-guard and test-coverage-reporter
workflows to prevent multiple simultaneous runs from competing for
copilot engine API quota. This addresses transient 403/429 failures
caused by concurrent workflow runs exhausting provider rate limits.

- security-guard: scoped per PR number, cancel-in-progress
- test-coverage-reporter: scoped per ref, cancel-in-progress

Fixes #5650, #5652, #5654

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 28, 2026 23:37
…ome-cleanup

# Conflicts:
#	src/artifact-preservation.ts
@github-actions

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 98.16% 98.20% 📈 +0.04%
Statements 98.10% 98.13% 📈 +0.03%
Functions 99.54% 99.54% ➡️ +0.00%
Branches 94.14% 94.14% ➡️ +0.00%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR primarily aims to reduce Copilot engine quota/rate-limit contention by serializing workflow runs via GitHub Actions concurrency, while also adjusting several related runtime/guard behaviors and regenerating the compiled .lock.yml workflows.

Changes:

  • Add/adjust concurrency groups (and cancel-in-progress) for agentic workflows to reduce simultaneous runs.
  • Improve cleanup robustness by passing image/prefix context into workdir removal and attempting rootless permission repair on deletion failure.
  • Update API-proxy “max runs” guard behavior/tests (status code semantics) and increase contribution-check max-turns (with corresponding test/lock updates).
Show a summary per file
File Description
src/container-cleanup.ts Passes runtime image/prefix context into work directory removal during cleanup.
src/artifact-preservation.ts Adds options plumbing and rootless permission-repair retry when removing chroot home.
scripts/ci/contribution-check-workflow.test.ts Updates assertion for contribution-check max-turns value.
containers/api-proxy/server.websocket.test.js Updates WebSocket guard test to expect 429 for max-runs exceeded.
containers/api-proxy/server.token-guards.test.js Updates HTTP guard test to expect 429 and message for max-runs exceeded.
containers/api-proxy/guards/common-guard-checks.js Changes max-runs guard status code to 429 and updates rationale comment.
.github/workflows/test-coverage-reporter.md Adds workflow-level concurrency group for reporter runs.
.github/workflows/test-coverage-reporter.lock.yml Regenerated compiled workflow reflecting concurrency and other compiler output changes.
.github/workflows/security-guard.md Adds workflow-level concurrency group for Security Guard runs.
.github/workflows/security-guard.lock.yml Regenerated compiled workflow reflecting concurrency and other compiler output changes.
.github/workflows/contribution-check.md Increases max-turns and retains concurrency configuration.
.github/workflows/contribution-check.lock.yml Regenerated compiled workflow reflecting max-turns and other compiler output changes.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 0
  • Review effort level: Low

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

@github-actions

Copy link
Copy Markdown
Contributor

🚀 Security Guard has started processing this pull request

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

Contribution guidelines review complete for PR #5662: no important missing items found; no comment needed.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude passed

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions github-actions Bot mentioned this pull request Jun 28, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Claude Engine Validation

  • API status: ✅ PASS
  • gh check: ✅ PASS
  • File status: ✅ PASS

Overall result: PASS

Generated by Smoke Claude for #5662 · 58.3 AIC · ⊞ 6.2K ·

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test Results — Copilot PAT Auth

Test Result
GitHub MCP connectivity
GitHub.com HTTP connectivity ✅ (200)
File write/read

Overall: PASS

Auth mode: PAT (COPILOT_GITHUB_TOKEN)
PR author: @lpcox

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Direct) Mode

GitHub MCP Connectivity: PR data retrieved successfully
Inference Path: Direct BYOK mode via api-proxy sidecar → api.githubcopilot.com
BYOK Placeholder Injection: Agent received dummy credential, real key held by sidecar

Status: PASS — Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY)

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor
  • GitHub MCP Testing: ✅
  • GitHub.com Connectivity: ✅
  • File Write/Read Test: ✅
  • BYOK Inference Test: ✅
    Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) authenticated via Microsoft Entra
    Overall status: PASS

cc @lpcox

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox

  • GitHub MCP PR fetch: ✅
  • GitHub.com HTTP: ✅
  • File I/O: ✅
  • BYOK inference: ✅

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)

PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

🤖 Smoke Test Results

PR: fix: add concurrency groups to prevent engine rate limiting — @lpcox

Test Result
GitHub MCP connectivity
GitHub.com HTTP ⚠️ pre-step data unavailable
File write/read ⚠️ pre-step data unavailable

Overall: ⚠️ PARTIAL — Copilot engine is responding; pre-step outputs (SMOKE_HTTP_CODE, SMOKE_FILE_CONTENT, SMOKE_FILE_PATH) were not resolved (template substitution failed in CI).

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke test results:

  • fix: handle EACCES during chroot-home cleanup in rootless Docker ✅
  • docs(runner-doctor): add C7 failure mode for DIFC probe on *.ghe.com ✅
  • fix: add concurrency groups to prevent engine rate limiting ✅
  • refactor: deduplicate BYOK COPILOT_MODEL test scaffolding ✅
    Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

Chroot Runtime Version Comparison

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.17.0 v22.23.0
Go go1.22.12 go1.22.12

Result: ❌ FAILED — Python and Node.js versions differ between host and chroot environments. The smoke-chroot label was not applied.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color passed ✅ PASS
Go env passed ✅ PASS
Go uuid passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx all passed ✅ PASS
Node.js execa all passed ✅ PASS
Node.js p-limit all passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Note (Java): The default ~/.m2 directory was owned by root in this runner, so Maven tests used -Dmaven.repo.local=/tmp/gh-aw/agent/m2-repo to resolve the local repository path. All compiles and tests succeeded.

Generated by Build Test Suite for #5662 · 42.8 AIC · ⊞ 7.8K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: API Proxy OTEL Tracing — Results

Scenario Status Detail
1. Module Loading otel.js loads; exports startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled + internals. isEnabled()true (FileSpanExporter fallback active by design)
2. Test Suite 59/59 tests passed across otel.test.js and otel-fanout.test.js
3. Env Var Forwarding src/services/api-proxy-env-config.ts forwards GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME
4. Token Tracker Integration onUsage callback exists in token-tracker-http.js; wired to otel.setTokenAttributes() + otel.setBudgetAttributes() in upstream-token.js
5. OTEL Diagnostics 1 span exported to /tmp/gh-aw/otel.jsonlgh-aw.agent.setup span with trace/parent linkage and full resource attributes

All 5 scenarios pass. OTEL tracing integration is functional.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Gemini Engine Validation

Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: GitHub Actions Services Connectivity

Check Result
Redis PING ❌ connection timeout
PostgreSQL pg_isready ❌ no response
PostgreSQL SELECT 1 ❌ connection timeout

host.docker.internal resolves to 172.17.0.1 but TCP connections to ports 6379 and 5432 both time out — service containers appear unreachable from this runner.

Overall: FAIL

🔌 Service connectivity validated by Smoke Services

@lpcox lpcox merged commit 6ab66e3 into main Jun 28, 2026
89 checks passed
@lpcox lpcox deleted the fix-rootless-chroot-home-cleanup branch June 28, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw] Contribution Check failed

2 participants