fix(doc-maintainer): prevent maxRuns 403 from wasted shell turns#5564
Conversation
The Documentation Maintainer workflow failed daily (since 2026-06-20) with a misleading "Authentication failed with provider ... (HTTP 403)" engine error that auto-files a failure issue (#5552). Root cause: the agent (claude-haiku-4.5, max-turns: 8) tried to run shell commands (`npm test`, `ls -la src/*.test.ts`) to "verify code examples", but the workflow sets `bash: false`. Each attempt returned "Permission denied and could not request permission from user", burning the agent's limited turn budget without producing any output. gh-aw maps `max-turns` to the api-proxy `maxRuns` hard cap, so once the 8 LLM invocations were exhausted the api-proxy returned a terminal, non-retryable 403 (max_runs_exceeded). The Copilot CLI surfaces that 403 as a generic "Authentication failed" credentials error, which the failure-issue reporter treats as an engine crash. Fix (workflow-scoped): - Tell the agent explicitly it has no shell access and must not attempt `git`, `npm test`, `ls`, or any shell command (they fail and waste turns), and that code-example verification is read-only. - Raise max-turns 8 -> 15 so the agent has budget headroom to finish before tripping the api-proxy maxRuns cap. Recompiled doc-maintainer.lock.yml (maxRuns/GH_AW_MAX_TURNS -> 15). Fixes #5552 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Documentation Maintainer agentic workflow to avoid wasting turns on disallowed shell usage (which can exhaust the api-proxy maxRuns budget and surface as a misleading Copilot “Authentication failed (403)” error), and increases the turn budget to provide more headroom for completing the documentation sync task.
Changes:
- Increase the workflow agent budget from
max-turns: 8→15. - Update the agent prompt to explicitly forbid shell-based verification and emphasize read-only verification via precomputed context and file reading.
- Regenerate
doc-maintainer.lock.ymlsomaxRuns/GH_AW_MAX_TURNSmatch the newmax-turnsvalue.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/doc-maintainer.md | Bumps max-turns and strengthens the prompt instructions to avoid shell attempts and wasted turns. |
| .github/workflows/doc-maintainer.lock.yml | Regenerated lock to reflect maxRuns: 15 and GH_AW_MAX_TURNS: 15. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
- Review effort level: Low
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
✅ Copilot review passed with no inline comments. @lpcox Add the |
|
🚀 Security Guard has started processing this pull request |
|
✅ Build Test Suite completed successfully! |
|
✅ Smoke Gemini completed. All facets verified. 💎 Testing bridge connectivity |
|
✅ Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓 |
|
Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded. |
|
✅ Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓 |
|
✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟 |
|
🔌 Smoke Services — All services reachable! ✅ |
|
🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅ |
|
📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅ |
|
✅ Contribution Check completed successfully! Contribution guidelines review complete: PR #5564 follows the applicable CONTRIBUTING.md requirements based on the provided metadata, diff, and contributing guide; no PR comment needed. |
|
✅ Smoke Claude passed |
|
📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤 |
|
✅ Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓 |
Smoke Test: Claude Engine Validation
Overall result: PASS
|
Smoke Test: Copilot BYOK (Direct) ✅ PASSTests:
Mode: Direct BYOK (COPILOT_PROVIDER_API_KEY) via api-proxy sidecar Result: All checks passed. Workflow is processing correctly.
|
🔬 Smoke Test Results — PASSPR: fix(doc-maintainer): prevent maxRuns 403 from wasted shell turns by @lpcox
Overall: PASS (2/2 live checks passed; pre-step data unavailable due to unsubstituted template vars)
|
Smoke Test
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
🔥 Smoke Test Results — Auth mode: PAT (COPILOT_GITHUB_TOKEN)
Overall: PASS (core connectivity verified) PR: fix(doc-maintainer): prevent maxRuns 403 from wasted shell turns
|
🔭 Smoke Test: API Proxy OpenTelemetry Tracing
All scenarios pass. OTEL tracing integration is functional on this PR's branch.
|
Smoke Test Results
Overall status: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
|
@lpcox ✅ GitHub MCP test
|
🧪 Chroot Version Comparison Results
Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot environments.
|
Smoke Test: GitHub Actions Services Connectivity
Overall: FAIL
|
Smoke Test Results for PR #5564
Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) Overall: PASS
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — ✅ PASS Notes
|
#5587) PR #5564 raised the Documentation Maintainer `max-turns` from 8 to 15 (to stop the agent exhausting its maxRuns budget on denied shell turns and surfacing a misleading 403) and rewrote the shell-restriction prompt line, but did not update scripts/ci/doc-maintainer-workflow.test.ts. This left `npm test` red with two stale assertions: - `max-turns: 8` / `GH_AW_MAX_TURNS: 8` -> now 15 - `**Do not run any \`git\` commands**` -> replaced by the new `**Do not use the \`shell\` tool** ...` wording Update the assertions to match the current source workflow and lock. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
The Documentation Maintainer workflow has failed every day since 2026-06-20 (last success 06-19), each time auto-filing a failure issue (#5552). The reported error is a misleading:
This is not a credentials problem — the api-proxy health-check passes and the BYOK key is correctly configured.
Root cause
claude-haiku-4.5,max-turns: 8) tried to "verify code examples" by running shell commands —npm test,ls -la src/*.test.ts— but the workflow setsbash: false. Each attempt returnedPermission denied and could not request permission from user, wasting turns and producing no output (Changes +0 -0).max-turns→ the api-proxymaxRunshard cap. The run's reflect snapshot confirms exhaustion:max_runs_exceeded— deliberately 403, not 429, so SDK clients don't retry-storm; seecontainers/api-proxy/guards/common-guard-checks.js). The Copilot CLI surfaces that 403 as a generic "Authentication failed" error, and the failure-issue reporter treats it as an engine crash.So a benign turn-budget exhaustion (made worse by wasted shell attempts) masquerades as an auth failure.
Fix (workflow-scoped)
git,npm test,ls, or any shell command, and that code-example verification is read-only.max-turns8 → 15, so the agent can finish before tripping the api-proxymaxRunscap.doc-maintainer.lock.yml(maxRuns/GH_AW_MAX_TURNS→ 15) and ran the smoke-workflow post-processor (no other lock files changed).Why not change the api-proxy?
The 403-on-
max_runs_exceededbehavior is intentional and documented incommon-guard-checks.js(429 would cause retry-storms against a non-recovering cap). The mislabeling of that 403 as "Authentication failed" happens in the Copilot CLI / gh-aw harness, outside this repo. The correct, in-scope fix is to keep this workflow within its budget.Fixes #5552