Skip to content

[aw-failures] [aw] Workflow Portfolio Analyst runs away to 95 turns and exhausts the 1000-AIC daily cap — agent job fails on 403 [Content truncated due to length] #40854

Description

@github-actions

Problem statement

Workflow Portfolio Analyst (Claude, claude-opus-4-8[1m]) failed when the provider returned 403 Maximum AI credits exceeded (1017.969875 / 1000) mid-run. The agent had already executed 95 turns over 27m19s, spending $10.18 / 8.85M cache-read tokens before the daily credit ceiling cut it off. The run produced 0 writes despite generating its chart assets.

Affected workflow and run IDs

Workflow Run Duration Turns Cost Terminal error
Workflow Portfolio Analyst §27971739618 27m19s 95 $10.18 403 Maximum AI credits exceeded (1017.97 / 1000)

Comparator (cohort baseline, success): §24658913450 — completed in 9 turns, write_capable.

Evidence

  1. Final result event: "is_error":true,"api_error_status":403,"num_turns":95,"result":"Failed to authenticate. API Error: 403 Maximum AI credits exceeded (1017.969875 / 1000)", total_cost_usd=10.18, cache_read_input_tokens=8,851,470.
  2. audit cohort comparison: classification changed (turns_increase, posture_changed) — turns 9 → 95 vs baseline; assessment resource_heavy_for_domain severity high (turns=95 tool_types=20 duration=27m19s write_actions=0).
  3. No firewall blocks (blocked_requests unchanged at 0) — this is a turn/budget runaway, not a network fault.

Probable root cause

The run expanded ~10x beyond its successful baseline turn count (9 → 95) with broad tool use (20 tool types) and never converged to emit its report, so it consumed the entire 1000-AIC daily budget and was terminated by the provider credit guard. Likely an unbounded analysis loop / missing turn ceiling rather than a single defect.

Proposed remediation

  1. Add a turn / iteration ceiling (or max-turns) to portfolio-analyst.md sized to the ~9-turn successful profile plus margin, so the workflow fails fast instead of draining the daily budget.
  2. Investigate why this run needed 95 turns (prompt/task-shape drift since the green baseline) and trim the analysis scope or pre-compute heavy aggregates.
  3. Consider a per-run AIC budget guard that aborts cleanly with a partial report before the global 1000 cap is hit.

Success criteria / verification

  • Next scheduled run completes success in turns comparable to the baseline (≤ ~15) without a 403 credit error.
  • Per-run cost returns to baseline range; daily 1000-AIC cap not consumed by a single run.

Correlation

Resource/cost theme overlaps the token-optimization tracking (#40812, #40784, #40783) but no existing issue covers this workflow's turn runaway. Parent report: #39883.
Related to #39883

Generated by 🔍 [aw] Failure Investigator (6h) · 245.5 AIC · ⊞ 4.9K ·

  • expires on Jun 29, 2026, 12:08 PM UTC-08:00

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions