Skip to content

[aw-failures] [aw] Copilot strict bash allowlist starves two more workflows — SPDD Spec Planner & Formal Spec Verifier hit max t [Content truncated due to length] #40853

Description

@github-actions

Problem statement

Two Copilot (BYOK claude-sonnet-4.6) workflows fail because their strict: true bash/tool allowlists deny routine read-only operations the agent issues. After 5 denials the Copilot SDK driver trips its guard.tool_denials_exceeded threshold and stops the session early (max tool denials threshold reached (5/5)), so the agent job exits failure having produced 0 writes — after burning 16–21 AIC over a single 19–24 minute turn.

This is the same failure class as #40755 (Daily Compiler Threat Spec Optimizer), now confirmed on two additional workflows. The pattern is systemic to strict-allowlist Copilot workflows, not workflow-specific.

Affected workflows and run IDs

Workflow Run Duration Turns Denied operation (representative)
Daily SPDD Spec Planner §27971023563 19m31s 1 shell(sed -n '1,100p' /tmp/copilot-tool-output-*.txt)
Daily Formal Spec Verifier §27970148606 24m31s 1 read(/home/runner/work/gh-aw/gh-aw/pkg/intent/resolver_test.go)

Comparator (SPDD last green): §27504800878 — succeeded read-only in 1 turn on 2026-06-14.

Evidence

  1. Harness classification: failureClass=permission_denied, permissionDeniedCount=11, hasNumerousPermissionDenied=true; harness logs attempt 1: detected numerous permission-denied issues — not retrying (classified as missing tool/permission issue) and emits missing_tool.
  2. SDK driver: [sdk-driver] max tool denials threshold reached (5/5); stopping SDK session early.
  3. audit-diff (27971023563 vs green baseline 27504800878): no firewall anomalies (has_anomalies=false); the failed run consumed 16.577 AIC in a single 24m28s turn — confirming a tool-denial spin loop, not a network/provider fault.

Probable root cause

The strict: true allowlists enumerate exact command forms (e.g. shell(sed -n), specific shell(cat ...) globs) but the agent issues legitimate variants that fall outside them:

  • sed -n '1,100p' <file> (line-range read) vs the allowed bare sed -n,
  • the builtin read/view tool on Go source/test files not covered by the shell(cat ...) globs.

The 5-denial guard then aborts the whole session, so a few off-allowlist reads kill an otherwise-functional run.

Proposed remediation

  1. Broaden the bash allowlists in daily-spdd-spec-planner.md and daily-formal-spec-verifier.md to cover read-range and file-read operations the agents actually need (e.g. allow sed -n, read/view of repo source/spec paths).
  2. Prefer a shared fix consistent with [aw-failures] [aw] Daily Compiler Threat Spec Optimizer fails 3 weeks running — strict bash allowlist denies sed/awk/read, agent loops 34min i [Content truncated due to length] #40755: define a reusable read-only tool profile (sed/awk/head/tail/read of repo & spec paths) for strict spec-analysis workflows instead of per-workflow enumeration.
  3. Consider raising or making configurable the SDK tool_denials threshold (currently 5) so a handful of denied probes does not abort a long session.

Success criteria / verification

  • Next scheduled runs of both workflows complete with conclusion=success, >1 turn, and ≥1 intended safe output.
  • No guard.tool_denials_exceeded events in agent-stdio.log.
  • permissionDeniedCount < 5 per run.

Correlation

Same class as #40755 (strict allowlist denies sed/awk/read). Parent report: #39883.
Related to #39883

Generated by 🔍 [aw] Failure Investigator (6h) · 245.5 AIC · ⊞ 4.9K ·

  • expires on Jun 29, 2026, 12:08 PM UTC-08:00

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions