Skip to content

[aw-failures] P0: Smoke CI agent crashes — Crush EROFS install failure & Gemini API key invalid #29666

Description

@github-actions

Problem Statement

Two Smoke CI runs on 2026-05-02 failed at the agent job level before producing any output, blocking all downstream jobs. These are the highest-severity failures because they produce zero telemetry and must be fixed before the engines can be validated.

Affected Workflows & Runs

Engine Run ID Duration Failure Mode
Crush v0.59.0 §25239718625 ~1m EROFS crash on binary install
Gemini CLI §25239718609 ~4m HTTP 400 API_KEY_INVALID

Root Cause Analysis

Crush — Read-Only Filesystem (EROFS)

Crush v0.59.0 is delivered as an npm package (@charmland/crush). During post-install, the package attempts to place the extracted binary at:

/opt/hostedtoolcache/node/24.14.1/x64/lib/node_modules/`@charmland/crush`/bin/

In the agentic workflow chroot/sandbox, the host filesystem is bind-mounted read-only. The directory exists on the host but cannot be written to inside the sandbox, producing:

ERREOFS: read-only file system, mkdir '/opt/hostedtoolcache/node/.../bin'

A secondary issue was also logged: Failed to transfer /host/home/runner/work/_temp/gh-aw/safeoutputs ownership to chroot user, indicating the sandbox setup itself has a gap with the Crush tool cache path.

Proposed remediation:

  • Override the Crush binary destination to a writable path (e.g., /tmp/crush-bin/) via the CRUSH_BIN_DIR env var or equivalent, OR
  • Pre-install the Crush binary outside the sandbox before the chroot is applied (during activation job), OR
  • Mount the specific node_modules subdirectory as writable in the sandbox configuration

Gemini — Invalid API Key

The Gemini CLI failed on its first API call with:

{"code": 400, "status": "INVALID_ARGUMENT",
 "message": "API key not valid. Please pass a valid API key.",
 "details": [{"reason": "API_KEY_INVALID", "domain": "googleapis.com"}]}

Both generateJson (routing) and sendMessageStream (main stream) paths hit this simultaneously at 2026-05-02T01:00:16Z. The Crush run also logged GEMINI_API_KEY is not set at startup, confirming the secret is absent or expired in the CI environment.

Proposed remediation:

  • Rotate the GEMINI_API_KEY secret in GitHub Actions (check expiry date in Google Cloud Console)
  • Add a startup validation step to the Gemini smoke workflow that checks GEMINI_API_KEY is non-empty before launching the agent, and fails fast with a clear error
  • Consider using a service account key instead of an API key for better lifecycle management

Success Criteria

  • Smoke Crush: next run reaches at least 1 agent turn and produces tool call telemetry
  • Smoke Gemini: next run does not emit gemini-client-error-*.json artifacts; agent completes at least 5 turns
  • Both engines: agent job concludes success in the next Smoke CI batch

References:

Generated by [aw] Failure Investigator (6h) · ● 639.1K ·

  • expires on May 9, 2026, 1:29 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions