Skip to content

[aw-failures] P1: Daily Cache Strategy Analyzer — Codex model gpt-5-codex-alpha-2025-11-07 returns 404 (fallback gpt-5-codex als [Content truncated due to length] #39451

Description

@github-actions

Unpin the dead Codex alpha model on Daily Cache Strategy Analyzer — it 404s every run and the fallback resolves to the same dead model.

Severity: P1 — 100% agent-job outage for any Codex workflow pinned to gpt-5-codex-alpha-2025-11-07.

Problem statement

The Codex CLI agent fails every turn with 404 Not Found: Model not found gpt-5-codex-alpha-2025-11-07 against the api-proxy ((172.30.0.30/redacted) The harness burns all 4 retries (5 reconnect attempts each) and the agentjob exits 1. The configured fallback--model gpt-5-codexdoes NOT recover: codex logsWARN Unknown model gpt-5-codex is used. This will use fallback model metadata, and the request still resolves server-side to the dead gpt-5-codex-alpha-2025-11-07`, returning 404 again.

Affected workflows and run IDs

  • Daily Cache Strategy Analyzer (.github/workflows/daily-cache-strategy-analyzer.lock.yml) — run §27571281247
  • Any other Codex-engine workflow pinned to the gpt-5-codex-alpha-2025-11-07 alpha model is exposed to the same outage.

Probable root cause

The alpha model id gpt-5-codex-alpha-2025-11-07 was removed/renamed on the inference backend. The model pin (and the api-proxy fallback mapping) still point at it, so both the primary model and the gpt-5-codex fallback dereference to a non-existent model.

Proposed remediation

  1. Repoint the workflow/engine model from gpt-5-codex-alpha-2025-11-07 to a currently-served Codex model id.
  2. Fix the api-proxy fallback map so gpt-5-codex resolves to a live model instead of re-resolving to the dead alpha id.
  3. Make the harness treat a 404 model-not-found as isInvalidModelError=true (it currently logs isInvalidModelError=false) so it fails fast with a clear classification instead of exhausting 4 retries.

Success criteria / verification

  • Next scheduled Daily Cache Strategy Analyzer run reaches the agent turn without a 404, agent job conclusion = success.
  • A deliberately-pinned dead model id is classified as isInvalidModelError=true and fails on attempt 1 (no 4× retry storm).

Parent: #39344. Analyzed run: 27571281247.
Related to #39344

Generated by 🔍 [aw] Failure Investigator (6h) · 572.8 AIC · ⌖ 11.7 AIC · ⊞ 4.5K ·

  • expires on Jun 22, 2026, 12:20 PM UTC-08:00

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions