Analysis Period: 2026-06-15 → 2026-06-22 · Runs Analyzed: 21 (2026-06-18: 13 runs, 2026-06-19: 4 runs, 2026-06-22: 4 runs)
Target Workflow
PR Code Quality Reviewer (pr-code-quality-reviewer.md)
Selected as the highest-volume, never-previously-optimized workflow in the 7-day window. Triggers on every pull_request: ready_for_review event and /review slash command — 13+ runs on peak days. Average AIC of 70.24/run (from 2026-06-19 data) and 180+ action-minutes on high-volume days make it the top cost target.
Cost Profile
| Metric |
Value |
| Total AIC analyzed (4 runs, 2026-06-19) |
280.94 |
| Avg AIC / run |
70.24 |
| Total action-minutes (21 runs) |
~278 min |
| Avg action-minutes / run |
~13 min |
| Avg agent-phase duration |
9.5 min (range: 4.4–15.8 min) |
| Raw tokens |
Not available in audit data |
| Avg turns / run |
Not available in audit data |
| Cache efficiency |
Not available |
| Conclusions (21 runs) |
100% success, 0 errors |
Wide agent-phase variance (4.4–15.8 min) is expected — PR size drives it. The sub-agent invocation (grumpy-coder) contributes a full LLM round-trip on every run.
Ranked Recommendations
1. Right-size grumpy-coder sub-agent from model: large to model: small · ~15 AIC/run
Action: In the ## agent: \grumpy-coder`block at the bottom ofpr-code-quality-reviewer.md`, change:
to:
Rationale: The grumpy-coder task is strictly extractive and classificatory:
- Input: PR diff + changed-file list
- Output: JSONL with a fixed 6-field schema (
path, line, severity, headline, impact, fix)
- Severity is a 4-value enum (
critical, high, medium, low)
This matches the established pattern for model: small sub-agents in this repo — aw-failure-investigator, api-consumption-report, dead-code-remover all use model: small for similar extractive sub-tasks.
Risk mitigation built in: The main agent already treats grumpy-coder output as "advisory (not authoritative)" and performs its own independent second pass. If the smaller model produces lower-quality findings, the main agent's pass compensates. The existing field-validation fix-up rules (Coerce line to int, Drop findings with invalid path/severity) provide additional resilience.
Evidence: 21/21 observed runs succeeded. The sub-agent is invoked once per run and its output is never the sole source of review comments.
2. Remove inline Go code example from Step 4 · ~4 AIC/run
Action: In Step 4 "Write Review Comments", remove the 15-line inline Go code example block and replace with a compact 2-line format spec.
Current (approximately 25 lines including surrounding text):
Example:
```markdown
**Potential nil dereference**: `user.Profile` is accessed without a nil check...
<details><summary>💡 Suggested fix</summary>
```go
if user.Profile == nil {
return ErrNoProfile
}
Callers that pass users without profiles...
```
```
Proposed replacement (~4 lines):
Format each comment as: one-sentence issue + impact (always visible), followed by detailed explanation and code fix in a `<details><summary>💡 ...</summary>` block.
Rationale: The Review Formatting guideline already specifies the <details> structure. The Go code snippet is illustrative boilerplate that adds prompt tokens without adding information the model doesn't already have. All 10 sampled runs produced correctly-structured review comments.
3. Consolidate Step 2 checklist with ## Guidelines / Review Focus · ~4 AIC/run
Action: Remove the "Review Focus" subsection from ## Guidelines and fold its 3 unique directives into Step 2 and the "Do not flag" list. The 6-bullet checklist in Step 2 ("Logic errors...", "Performance issues...", etc.) already covers the same ground.
Unique content to preserve:
- "Respect time — complete within the 15-minute timeout" → add as a note at the end of Step 5
- "Avoid friendliness padding" → already captured in Step 4 "Tone" section
Rationale: The Review Focus section (6 bullets) overlaps almost entirely with: (a) the 8-item second-pass checklist in Step 2, (b) the "Tone" paragraph in Step 4, and (c) the "Do not flag" list. Removing the duplicate reduces prompt tokens with no functional loss.
4. Add explicit cache-file existence guard in Step 1 · ~2 AIC/run
Action: Change the cache-file read in Step 1 from:
- (Optional) `/tmp/gh-aw/cache-memory/pr-${{ ... }}.json` for past review themes
to:
- `/tmp/gh-aw/cache-memory/pr-${{ ... }}.json` — only if it exists (use `[ -f <path> ]` before reading; skip on first-review runs)
Rationale: Most runs are first-time reviews on new PRs — the cache file won't exist. Without an explicit guard, the agent attempts the read, receives a "file not found" error, and handles it as a tool-call failure. At 13+ runs/day, this is a wasted tool call on the majority of invocations. Clarifying the guard eliminates the wasted round-trip.
5. Condense grumpy-coder field-validation rules · ~1 AIC/run
Action: In the grumpy-coder sub-agent prompt, replace the 4-line validation block:
If any field is malformed, fix it before returning:
- Coerce `line` to an integer.
- Drop findings with invalid `path` or invalid `severity`.
- Truncate overly long text fields to concise summaries.
with a 1-line equivalent:
Fix malformed output before returning: coerce `line` to int, drop findings with invalid `path` or `severity`, truncate long text fields.
Rationale: Minor token reduction (3 lines → 1). No functional change — the validation rules are preserved.
Caveats
- AIC data coverage: Only 4 of 21 audited runs have AIC measurements (2026-06-19 snapshot only). The 70.24 avg AIC/run is an estimate from a single day; actual average may differ.
- grumpy-coder model change: Validate on 5–10 runs before full rollout. The main agent's second pass provides a backstop, but initial testing should confirm that
model: small output quality is adequate for the advisory role.
- Agent-phase variance: The 4.4–15.8 min range suggests PR size is the dominant cost driver. Token savings from prompt trimming will be proportionally larger on small PRs.
- Shared imports:
shared/pr-review-base.md and shared/pr-code-review-config.md add overlapping "Review Guidelines" content. These are out of scope for this issue (shared across multiple workflows), but they represent additional consolidation opportunity in a separate effort.
Total estimated savings: ~26 AIC/run · At 10–13 runs/day → 260–338 AIC/day
References:
Generated by Agentic Workflow AIC Usage Optimizer · 148 AIC · ⊞ 7.1K · ◷
Analysis Period: 2026-06-15 → 2026-06-22 · Runs Analyzed: 21 (2026-06-18: 13 runs, 2026-06-19: 4 runs, 2026-06-22: 4 runs)
Target Workflow
PR Code Quality Reviewer (
pr-code-quality-reviewer.md)Selected as the highest-volume, never-previously-optimized workflow in the 7-day window. Triggers on every
pull_request: ready_for_reviewevent and/reviewslash command — 13+ runs on peak days. Average AIC of 70.24/run (from 2026-06-19 data) and 180+ action-minutes on high-volume days make it the top cost target.Cost Profile
Wide agent-phase variance (4.4–15.8 min) is expected — PR size drives it. The sub-agent invocation (grumpy-coder) contributes a full LLM round-trip on every run.
Ranked Recommendations
1. Right-size
grumpy-codersub-agent frommodel: largetomodel: small· ~15 AIC/runAction: In the
## agent: \grumpy-coder`block at the bottom ofpr-code-quality-reviewer.md`, change:to:
Rationale: The grumpy-coder task is strictly extractive and classificatory:
path,line,severity,headline,impact,fix)critical,high,medium,low)This matches the established pattern for
model: smallsub-agents in this repo —aw-failure-investigator,api-consumption-report,dead-code-removerall usemodel: smallfor similar extractive sub-tasks.Risk mitigation built in: The main agent already treats grumpy-coder output as "advisory (not authoritative)" and performs its own independent second pass. If the smaller model produces lower-quality findings, the main agent's pass compensates. The existing field-validation fix-up rules (
Coerce line to int,Drop findings with invalid path/severity) provide additional resilience.Evidence: 21/21 observed runs succeeded. The sub-agent is invoked once per run and its output is never the sole source of review comments.
2. Remove inline Go code example from Step 4 · ~4 AIC/run
Action: In Step 4 "Write Review Comments", remove the 15-line inline Go code example block and replace with a compact 2-line format spec.
Current (approximately 25 lines including surrounding text):
Callers that pass users without profiles...
``` ```Proposed replacement (~4 lines):
Rationale: The Review Formatting guideline already specifies the
<details>structure. The Go code snippet is illustrative boilerplate that adds prompt tokens without adding information the model doesn't already have. All 10 sampled runs produced correctly-structured review comments.3. Consolidate Step 2 checklist with
## Guidelines / Review Focus· ~4 AIC/runAction: Remove the "Review Focus" subsection from
## Guidelinesand fold its 3 unique directives into Step 2 and the "Do not flag" list. The 6-bullet checklist in Step 2 ("Logic errors...", "Performance issues...", etc.) already covers the same ground.Unique content to preserve:
Rationale: The Review Focus section (6 bullets) overlaps almost entirely with: (a) the 8-item second-pass checklist in Step 2, (b) the "Tone" paragraph in Step 4, and (c) the "Do not flag" list. Removing the duplicate reduces prompt tokens with no functional loss.
4. Add explicit cache-file existence guard in Step 1 · ~2 AIC/run
Action: Change the cache-file read in Step 1 from:
to:
Rationale: Most runs are first-time reviews on new PRs — the cache file won't exist. Without an explicit guard, the agent attempts the read, receives a "file not found" error, and handles it as a tool-call failure. At 13+ runs/day, this is a wasted tool call on the majority of invocations. Clarifying the guard eliminates the wasted round-trip.
5. Condense grumpy-coder field-validation rules · ~1 AIC/run
Action: In the grumpy-coder sub-agent prompt, replace the 4-line validation block:
with a 1-line equivalent:
Rationale: Minor token reduction (3 lines → 1). No functional change — the validation rules are preserved.
Caveats
model: smalloutput quality is adequate for the advisory role.shared/pr-review-base.mdandshared/pr-code-review-config.mdadd overlapping "Review Guidelines" content. These are out of scope for this issue (shared across multiple workflows), but they represent additional consolidation opportunity in a separate effort.Total estimated savings: ~26 AIC/run · At 10–13 runs/day → 260–338 AIC/day
References: