[audit-workflows] Agentic Workflow Audit — 2026-05-01 #29632
Replies: 3 comments 1 reply
-
|
/q review and fix cache-memory handling |
Beta Was this translation helpful? Give feedback.
-
|
💥 WHOOSH! ⚡ KA-POW! 🦸 The Smoke Test Agent swoops in from the agentic pipeline! ZZZAP! Run §25236550777 — CLAUDE ENGINE: NOMINAL!
🎉 All systems checked! MCP tools, Serena, Playwright, Tavily — ALL GREEN! The smoke test agent was here and left only victories in its wake! BLAMMO! 💫
|
Beta Was this translation helpful? Give feedback.
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #29836. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
19 workflow runs in the last 24 hours (9 Claude, 10 Copilot). Overall health is strong at 94.7% success rate — 18 successes, 1 failure. Total cost was $4.25 with 10.03M tokens consumed (7.30M effective). The single failure was a
max_turnsexhaustion in "Design Decision Gate" triggered by an early security block that disrupted its normal execution path.Workflow Health Trends
Today marks the first day of historical tracking in this repo memory store. The baseline is 94.7% success across 19 runs. Future audits will show trend direction as data accumulates.
Token & Cost Trends
10.03M raw tokens / 7.30M effective tokens today, $4.25 total cost. Cache efficiency is split by engine: Claude runs achieve 91–99% cache hit rate while Copilot runs average ~47%. The 7-day moving average will populate as daily data accumulates.
Run Summary (2026-05-01)
Engine distribution: Claude Code (9 runs), GitHub Copilot CLI (10 runs)
❌ Failure Analysis
Run §25232335913 — Design Decision Gate (Claude Code)
error_max_turns— agent exhausted the 12-turn limit without completingcatof/tmp/gh-aw/agent/adr-prefetch-summary.jsonwas blocked) disrupted the normal 4-turn execution flow, causing the agent to spiral into retries and fallback pathsisMaxTurnsExit=trueand did not retryDesign Decision Gate: 5-run comparison
The healthy runs are extremely consistent (~215K tokens, 4 turns). The failure is an isolated outlier caused by the security block cascading into turn exhaustion.
Recommendation: Increase
max_turnsfor Design Decision Gate to 20 or add graceful degradation when a file read is blocked (skip the read, proceed with available context). The current 12-turn limit leaves no headroom when a step fails and requires recovery.High-Token Runs
Notable: Lockfile Statistics Analysis Agent has exceptional cache efficiency (99.4%) — 1.35M raw tokens collapses to 259K effective. This is the right pattern for long-running agents.
Concern: Contribution Check and Daily Project Perf Summary Generator both have effective tokens exceeding raw tokens (poor Copilot cache utilization). Each run re-reads the same large context without caching benefit.
Turn Count Volatility — Test Quality Sentinel
Across 5 Copilot runs of the same workflow, turns ranged from 3 to 15 (5× swing):
High turn variance suggests prompt instability or task-shape drift. The 3-turn run used 7× fewer tokens than the 15-turn runs. Worth investigating whether input size or PR complexity is the driver, or if it's non-determinism in agent planning.
Firewall Block Patterns
Two Copilot workflows had significant outbound request blocking:
(unknown)domain(unknown)domain(unknown)domain likely indicates DNS resolution failures for non-whitelisted outbound endpoints inside the sandbox. These workflows may be attempting to fetch external resources (APIs, URLs) that aren't in the allowed-domains list. All other 17 workflows had 0 blocked requests.Missing Tools
None.
total_missing_tools: 0across all 19 runs.Recommendations
Design Decision Gate — increase
max_turnsor add file-read fallback: The current 12-turn limit has zero recovery headroom when an early step is blocked. Either raise to 20 turns or make the ADR prefetch step gracefully skip when blocked.Investigate Copilot workflows with firewall blocks: "Daily Project Performance Summary Generator" and "Copilot PR Prompt Pattern Analysis" are hitting unknown/blocked domains at 40-47% rates. Audit what URLs they're trying to reach and either add to allowed-domains or remove the external calls.
Test Quality Sentinel turn variance: Investigate whether PR complexity correlates with turn count. If so, consider prompt improvements to cap reasoning steps; if not, investigate non-determinism.
Seed historical metrics: This is the first audit with repo memory tracking. Trends will become meaningful after a week of data collection.
References:
Beta Was this translation helpful? Give feedback.
All reactions