[audit-workflows] Agentic Workflow Audit — 2026-05-01 #29632

2026-05-01T21:39:34Z

github-actions[bot]
Bot May 1, 2026

Overview

19 workflow runs in the last 24 hours (9 Claude, 10 Copilot). Overall health is strong at 94.7% success rate — 18 successes, 1 failure. Total cost was $4.25 with 10.03M tokens consumed (7.30M effective). The single failure was a max_turns exhaustion in "Design Decision Gate" triggered by an early security block that disrupted its normal execution path.

Workflow Health Trends

Today marks the first day of historical tracking in this repo memory store. The baseline is 94.7% success across 19 runs. Future audits will show trend direction as data accumulates.

Token & Cost Trends

10.03M raw tokens / 7.30M effective tokens today, $4.25 total cost. Cache efficiency is split by engine: Claude runs achieve 91–99% cache hit rate while Copilot runs average ~47%. The 7-day moving average will populate as daily data accumulates.

Run Summary (2026-05-01)

Metric	Value
Total runs	19
Success / Failure	18 / 1
Success rate	94.7%
Total tokens (raw)	10,032,409
Total tokens (effective)	7,295,476
Total cost	$4.25
Total action minutes	129
Total turns	245
GitHub API calls	73
Missing tools	0
Missing data	0

Engine distribution: Claude Code (9 runs), GitHub Copilot CLI (10 runs)

❌ Failure Analysis

Run §25232335913 — Design Decision Gate (Claude Code)

Error: error_max_turns — agent exhausted the 12-turn limit without completing
Root cause: An early security block (cat of /tmp/gh-aw/agent/adr-prefetch-summary.json was blocked) disrupted the normal 4-turn execution flow, causing the agent to spiral into retries and fallback paths
Impact: 1,049,606 tokens consumed — 4.8× the normal ~215,000 for this workflow; cost $0.85 vs typical $0.23
Non-retriable: The harness correctly identified isMaxTurnsExit=true and did not retry

Design Decision Gate: 5-run comparison

Run	Status	Tokens	Cost	Turns	Duration
25233623539	✅ success	215,141	$0.22	4	4.2m
25233603237	✅ success	215,237	$0.29	5	5.2m
25232646514	✅ success	214,871	$0.23	4	4.6m
25232030752	✅ success	214,899	$0.23	4	4.3m
25232335913	❌ failure	1,049,606	$0.85	13	8.3m

The healthy runs are extremely consistent (~215K tokens, 4 turns). The failure is an isolated outlier caused by the security block cascading into turn exhaustion.

Recommendation: Increase max_turns for Design Decision Gate to 20 or add graceful degradation when a file read is blocked (skip the read, proceed with available context). The current 12-turn limit leaves no headroom when a step fails and requires recovery.

⚠️ Performance Observations

High-Token Runs

Workflow	Run	Tokens (raw)	Effective	Turns	Cost
Daily Project Perf Summary Generator	25232977734	1,553,769	1,731,431	34	—
Lockfile Statistics Analysis Agent	25231777137	1,348,801	259,290	45	$0.94
Contribution Check	25233221584	1,251,579	1,436,234	48	—
Design Decision Gate (failed)	25232335913	1,049,606	202,022	13	$0.85

Notable: Lockfile Statistics Analysis Agent has exceptional cache efficiency (99.4%) — 1.35M raw tokens collapses to 259K effective. This is the right pattern for long-running agents.

Concern: Contribution Check and Daily Project Perf Summary Generator both have effective tokens exceeding raw tokens (poor Copilot cache utilization). Each run re-reads the same large context without caching benefit.

Turn Count Volatility — Test Quality Sentinel

Across 5 Copilot runs of the same workflow, turns ranged from 3 to 15 (5× swing):

Run	Tokens	Turns	Duration
25232646515	117,081	3	4.7m
25233623558	365,881	7	5.8m
25232030746	579,175	12	6.2m
25232335879	708,315	15	7.2m
25233603244	605,959	15	6.9m

High turn variance suggests prompt instability or task-shape drift. The 3-turn run used 7× fewer tokens than the 15-turn runs. Worth investigating whether input size or PR complexity is the driver, or if it's non-determinism in agent planning.

Firewall Block Patterns

Two Copilot workflows had significant outbound request blocking:

Daily Project Performance Summary Generator: 34/84 requests blocked (40%) — all from (unknown) domain
Copilot PR Prompt Pattern Analysis: 9/19 requests blocked (47%) — all from (unknown) domain

(unknown) domain likely indicates DNS resolution failures for non-whitelisted outbound endpoints inside the sandbox. These workflows may be attempting to fetch external resources (APIs, URLs) that aren't in the allowed-domains list. All other 17 workflows had 0 blocked requests.

Missing Tools

None. total_missing_tools: 0 across all 19 runs.

Recommendations

Design Decision Gate — increase max_turns or add file-read fallback: The current 12-turn limit has zero recovery headroom when an early step is blocked. Either raise to 20 turns or make the ADR prefetch step gracefully skip when blocked.
Investigate Copilot workflows with firewall blocks: "Daily Project Performance Summary Generator" and "Copilot PR Prompt Pattern Analysis" are hitting unknown/blocked domains at 40-47% rates. Audit what URLs they're trying to reach and either add to allowed-domains or remove the external calls.
Test Quality Sentinel turn variance: Investigate whether PR complexity correlates with turn count. If so, consider prompt improvements to cap reasoning steps; if not, investigate non-determinism.
Seed historical metrics: This is the first audit with repo memory tracking. Trends will become meaningful after a week of data collection.

References:

§25232335913 — Design Decision Gate failure (max_turns)
§25232977734 — Daily Project Performance Summary Generator (high tokens)
§25233221584 — Contribution Check (high effective tokens)

Generated by Agentic Workflow Audit Agent · ● 552K · ◷

expires on May 2, 2026, 9:39 PM UTC

pelikhan · 2026-05-01T22:51:52Z

pelikhan
May 1, 2026
Maintainer

/q review and fix cache-memory handling

1 reply

github-actions[bot] Bot May 1, 2026
Author

🔧 Pay attention, 007! Q is preparing your gadgets for this discussion comment...

2026-05-01T23:02:17Z

github-actions[bot]
Bot May 1, 2026
Author

💥 WHOOSH! ⚡ KA-POW! 🦸

The Smoke Test Agent swoops in from the agentic pipeline!

ZZZAP! Run §25236550777 — CLAUDE ENGINE: NOMINAL!

"With great automation comes great responsibility."

🎉 All systems checked! MCP tools, Serena, Playwright, Tavily — ALL GREEN! The smoke test agent was here and left only victories in its wake! BLAMMO! 💫

💥 [THE END] — Illustrated by Smoke Claude · ● 380.4K · ◷

0 replies

2026-05-02T21:43:51Z

github-actions[bot]
Bot May 2, 2026
Author

This discussion has been marked as outdated by Agentic Workflow Audit Agent.

A newer discussion is available at Discussion #29836.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[audit-workflows] Agentic Workflow Audit — 2026-05-01 #29632

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[audit-workflows] Agentic Workflow Audit — 2026-05-01 #29632

Uh oh!

github-actions[bot] Bot May 1, 2026

Overview

Workflow Health Trends

Token & Cost Trends

Run Summary (2026-05-01)

❌ Failure Analysis

⚠️ Performance Observations

Missing Tools

Recommendations

Replies: 3 comments · 1 reply

Uh oh!

pelikhan May 1, 2026 Maintainer

Uh oh!

github-actions[bot] Bot May 1, 2026 Author

Uh oh!

github-actions[bot] Bot May 1, 2026 Author

Uh oh!

github-actions[bot] Bot May 2, 2026 Author

github-actions[bot]
Bot May 1, 2026

Replies: 3 comments 1 reply

pelikhan
May 1, 2026
Maintainer

github-actions[bot] Bot May 1, 2026
Author

github-actions[bot]
Bot May 1, 2026
Author

github-actions[bot]
Bot May 2, 2026
Author