Skip to content

[aw-failures] Failure Investigator (6h) — 2026-06-11 08:25Z: 5 failure clusters across 12 runs #38556

Description

@github-actions

Executive Summary

Fix the Gemini model router first — it routes text tasks to the unpriced gemini-3.1-flash-tts-preview model and has broken every Smoke Gemini run in the last 6h (P0, sub-issue #38558). Two content workflows are exhausting the 1000-AIC per-run budget; a Windows integration test is timing out; and the Git Simulator memory-branch push is blocked by a signed-commit rule. 9 recovered/stale auto-trackers were closed; 4 active trackers are linked below.

  • Window: 2026-06-11 02:25Z–08:25Z (6h)
  • Failed agentic runs analyzed: 12
  • Clusters: 5 active + 1 recovered
  • Status: ❌ 1 P0 active · ⚠️ 3 P1 active · 🟡 1 P2 active

Why a new parent report: No open Failure Investigator parent existed (all prior [aw-failures] reports are closed) and the P0 Gemini failure had only a thin per-run auto-tracker (#38515). This report consolidates coverage and surfaces 3 previously-untracked failures.

Failure Cluster Table

# Cluster Workflows Runs Failure class Root cause Priority Tracking
C1 Gemini router → unpriced TTS model Smoke Gemini 4 engine/API pricing misconfig Router selects gemini-3.1-flash-tts-preview (no AIC pricing) → gemini-cli status 400 unknown_model_ai_credits on first call P0 #38515 + fix #38558
C2 AIC per-run budget exhaustion Code Simplifier, Workflow Skill Extractor 2 engine/API 429 Per-run AI-credit cap (1000) exceeded (~1014–1025) after 5 retries → harness budget abort P1 #38499, #38501
C3 Windows integration timeout Daily Windows CLI Integration 1 build/integration timeout gh-aw.exe compile --help exceeded 30000ms in Windows scenario-matrix step (build+verify passed) P1 untracked
C4 Memory-branch push blocked Daily Safe Outputs Git Simulator 1 permissions / repo policy Orphan branch memory/git-simulator push declined — GH013 signed-commit ruleset (3-day streak; agent succeeded) P1 untracked
C5 Safe-output validation LintMonster, Smoke Copilot 2 safe-output validation update_issue "Target is 'triggering' but not running in issue context" (digest mismatch); add_labels/remove_labels "No issue/PR number available" on workflow_dispatch P2 recovered
C6 Recovered / stale Codex, Pi, Antigravity, Copilot, Copilot-AOAI, jsweep, PR Sous Chef, Conformance Checker mixed Latest runs green 9 trackers closed

Evidence

C1 — Gemini router → unpriced TTS model (P0)

Audited runs §27327422881 and §27331325693 — identical signature, both fail on the first API call (0 tokens, 0 tool calls):

status: 400
[API Error: {"type":"unknown_model_ai_credits","message":"Model \"gemini-3.1-flash-tts-preview\" has no AI credits pricing and no default pricing is configured. Set apiProxy.defaultAiCreditsPricing in the AWF config ... or add the model to the pricing table.","model":"gemini-3.1-flash-tts-preview"}]

Anomaly: the router (NumericalClassifierStrategy.routeModelRouterService.route) selected a TTS preview model for a text task; configured text models were gemini-2.5-flash-lite / gemini-3.1-pro-preview. All 4 window runs (27327422881, 27331325693, 27329688408, 27320398264) share this signature.

C2 — AIC per-run budget exhaustion (P1)
  • Code Simplifier §27324513461: CAPIError: 429 Maximum AI credits exceeded (1014.39 / 1000), "Max AI credits exceeded (harness budget abort): true". Failing 3 consecutive daily runs.
  • Workflow Skill Extractor §27325185979: CAPIError: 429 Maximum AI credits exceeded (1025.08 / 1000).

No tool/permission/firewall blocks; agent aborts after 5 failed model-request retries.

C3 / C4 / C5 — Untracked + validation failures
  • C3 Daily Windows CLI Integration §27332389236: TIMEOUT: 'D:\a\gh-aw\gh-aw\gh-aw.exe compile --help' exceeded 30000ms. Cross-compile + binary verify passed — this is a CI/integration matrix failure, not an agentic-engine failure.
  • C4 Daily Safe Outputs Git Simulator §27327037693: remote: error: GH013 ... refs/heads/memory/git-simulator ... requires signed commit ... push declined after 4 attempts. Agent succeeded (all 4 sim configs PASS). Failing 3 days straight.
  • C5 LintMonster §27322319441: agent succeeded; safe_outputs failed on digest-mismatch + 3× update_issue "Target is 'triggering' but not running in issue context". Smoke Copilot §27320477001: add_labels/remove_labels "No issue/PR number available" on workflow_dispatch (now recovered).

Existing Issue Correlation

Fix Roadmap

Issues Closed (9)

Recovered/stale auto-trackers closed with evidence: #38535, #38517 (dup), #38512, #38513, #38519, #38542, #38530, #38540, #38505.

Sub-Issues

References: §27327422881 · §27324513461 · §27332389236

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions