[agentic-token-optimizer] Optimize PR Code Quality Reviewer: sub-agent model right-sizing + prompt verbosity reduction

**Analysis Period**: 2026-06-15 → 2026-06-22 · **Runs Analyzed**: 21 (2026-06-18: 13 runs, 2026-06-19: 4 runs, 2026-06-22: 4 runs)

### Target Workflow

**PR Code Quality Reviewer** ([`pr-code-quality-reviewer.md`](https://github.com/github/gh-aw/blob/main/.github/workflows/pr-code-quality-reviewer.md))

Selected as the highest-volume, never-previously-optimized workflow in the 7-day window. Triggers on every `pull_request: ready_for_review` event and `/review` slash command — 13+ runs on peak days. Average AIC of 70.24/run (from 2026-06-19 data) and 180+ action-minutes on high-volume days make it the top cost target.

### Cost Profile

| Metric | Value |
|---|---|
| Total AIC analyzed (4 runs, 2026-06-19) | 280.94 |
| Avg AIC / run | 70.24 |
| Total action-minutes (21 runs) | ~278 min |
| Avg action-minutes / run | ~13 min |
| Avg agent-phase duration | 9.5 min (range: 4.4–15.8 min) |
| Raw tokens | Not available in audit data |
| Avg turns / run | Not available in audit data |
| Cache efficiency | Not available |
| Conclusions (21 runs) | 100% success, 0 errors |

Wide agent-phase variance (4.4–15.8 min) is expected — PR size drives it. The sub-agent invocation (grumpy-coder) contributes a full LLM round-trip on every run.

### Ranked Recommendations

#### 1. Right-size `grumpy-coder` sub-agent from `model: large` to `model: small` · **~15 AIC/run**

**Action**: In the `## agent: \`grumpy-coder\`` block at the bottom of `pr-code-quality-reviewer.md`, change:
```yaml
model: large
```
to:
```yaml
model: small
```

**Rationale**: The grumpy-coder task is strictly extractive and classificatory:
- Input: PR diff + changed-file list
- Output: JSONL with a fixed 6-field schema (`path`, `line`, `severity`, `headline`, `impact`, `fix`)
- Severity is a 4-value enum (`critical`, `high`, `medium`, `low`)

This matches the established pattern for `model: small` sub-agents in this repo — `aw-failure-investigator`, `api-consumption-report`, `dead-code-remover` all use `model: small` for similar extractive sub-tasks.

**Risk mitigation built in**: The main agent already treats grumpy-coder output as "advisory (not authoritative)" and performs its own independent second pass. If the smaller model produces lower-quality findings, the main agent's pass compensates. The existing field-validation fix-up rules (`Coerce line to int`, `Drop findings with invalid path/severity`) provide additional resilience.

**Evidence**: 21/21 observed runs succeeded. The sub-agent is invoked once per run and its output is never the sole source of review comments.

#### 2. Remove inline Go code example from Step 4 · **~4 AIC/run**

**Action**: In Step 4 "Write Review Comments", remove the 15-line inline Go code example block and replace with a compact 2-line format spec.

Current (approximately 25 lines including surrounding text):
```markdown
Example:
```markdown
**Potential nil dereference**: `user.Profile` is accessed without a nil check...
<details><summary>💡 Suggested fix</summary>
```go
if user.Profile == nil {
    return ErrNoProfile
}
```
Callers that pass users without profiles...
</details>
```
```

**Proposed replacement** (~4 lines):
```markdown
Format each comment as: one-sentence issue + impact (always visible), followed by detailed explanation and code fix in a `<details><summary>💡 ...</summary>` block.
```

**Rationale**: The Review Formatting guideline already specifies the `<details>` structure. The Go code snippet is illustrative boilerplate that adds prompt tokens without adding information the model doesn't already have. All 10 sampled runs produced correctly-structured review comments.

#### 3. Consolidate Step 2 checklist with `## Guidelines / Review Focus` · **~4 AIC/run**

**Action**: Remove the "Review Focus" subsection from `## Guidelines` and fold its 3 unique directives into Step 2 and the "Do not flag" list. The 6-bullet checklist in Step 2 ("Logic errors...", "Performance issues...", etc.) already covers the same ground.

Unique content to preserve:
- "Respect time — complete within the 15-minute timeout" → add as a note at the end of Step 5
- "Avoid friendliness padding" → already captured in Step 4 "Tone" section

**Rationale**: The Review Focus section (6 bullets) overlaps almost entirely with: (a) the 8-item second-pass checklist in Step 2, (b) the "Tone" paragraph in Step 4, and (c) the "Do not flag" list. Removing the duplicate reduces prompt tokens with no functional loss.

#### 4. Add explicit cache-file existence guard in Step 1 · **~2 AIC/run**

**Action**: Change the cache-file read in Step 1 from:
```
- (Optional) `/tmp/gh-aw/cache-memory/pr-${{ ... }}.json` for past review themes
```
to:
```
- `/tmp/gh-aw/cache-memory/pr-${{ ... }}.json` — only if it exists (use `[ -f <path> ]` before reading; skip on first-review runs)
```

**Rationale**: Most runs are first-time reviews on new PRs — the cache file won't exist. Without an explicit guard, the agent attempts the read, receives a "file not found" error, and handles it as a tool-call failure. At 13+ runs/day, this is a wasted tool call on the majority of invocations. Clarifying the guard eliminates the wasted round-trip.

#### 5. Condense grumpy-coder field-validation rules · **~1 AIC/run**

**Action**: In the grumpy-coder sub-agent prompt, replace the 4-line validation block:
```
If any field is malformed, fix it before returning:
- Coerce `line` to an integer.
- Drop findings with invalid `path` or invalid `severity`.
- Truncate overly long text fields to concise summaries.
```
with a 1-line equivalent:
```
Fix malformed output before returning: coerce `line` to int, drop findings with invalid `path` or `severity`, truncate long text fields.
```

**Rationale**: Minor token reduction (3 lines → 1). No functional change — the validation rules are preserved.

### Caveats

- **AIC data coverage**: Only 4 of 21 audited runs have AIC measurements (2026-06-19 snapshot only). The 70.24 avg AIC/run is an estimate from a single day; actual average may differ.
- **grumpy-coder model change**: Validate on 5–10 runs before full rollout. The main agent's second pass provides a backstop, but initial testing should confirm that `model: small` output quality is adequate for the advisory role.
- **Agent-phase variance**: The 4.4–15.8 min range suggests PR size is the dominant cost driver. Token savings from prompt trimming will be proportionally larger on small PRs.
- **Shared imports**: `shared/pr-review-base.md` and `shared/pr-code-review-config.md` add overlapping "Review Guidelines" content. These are out of scope for this issue (shared across multiple workflows), but they represent additional consolidation opportunity in a separate effort.

**Total estimated savings**: ~26 AIC/run · At 10–13 runs/day → **260–338 AIC/day**

**References:**
- [§27933865956](https://github.com/github/gh-aw/actions/runs/27933865956) — 10.2 min agent phase (mid-size PR)
- [§27920793785](https://github.com/github/gh-aw/actions/runs/27920793785) — 15.8 min agent phase (largest observed)
- [§27961144129](https://github.com/github/gh-aw/actions/runs/27961144129) — 4.4 min agent phase (smallest observed)







> Generated by [Agentic Workflow AIC Usage Optimizer](https://github.com/github/gh-aw/actions/runs/27967320242) · 148 AIC · ⊞ 7.1K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fagentic-token-optimizer%22&type=issues)
> - [x] expires  on Jun 29, 2026, 8:35 AM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[agentic-token-optimizer] Optimize PR Code Quality Reviewer: sub-agent model right-sizing + prompt verbosity reduction #40812

Target Workflow

Cost Profile

Ranked Recommendations

1. Right-size `grumpy-coder` sub-agent from `model: large` to `model: small` · ~15 AIC/run

2. Remove inline Go code example from Step 4 · ~4 AIC/run

3. Consolidate Step 2 checklist with `## Guidelines / Review Focus` · ~4 AIC/run

4. Add explicit cache-file existence guard in Step 1 · ~2 AIC/run

5. Condense grumpy-coder field-validation rules · ~1 AIC/run

Caveats

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Total AIC analyzed (4 runs, 2026-06-19)	280.94
Avg AIC / run	70.24
Total action-minutes (21 runs)	~278 min
Avg action-minutes / run	~13 min
Avg agent-phase duration	9.5 min (range: 4.4–15.8 min)
Raw tokens	Not available in audit data
Avg turns / run	Not available in audit data
Cache efficiency	Not available
Conclusions (21 runs)	100% success, 0 errors

Uh oh!

[agentic-token-optimizer] Optimize PR Code Quality Reviewer: sub-agent model right-sizing + prompt verbosity reduction #40812

Description

Target Workflow

Cost Profile

Ranked Recommendations

1. Right-size grumpy-coder sub-agent from model: large to model: small · ~15 AIC/run

2. Remove inline Go code example from Step 4 · ~4 AIC/run

3. Consolidate Step 2 checklist with ## Guidelines / Review Focus · ~4 AIC/run

4. Add explicit cache-file existence guard in Step 1 · ~2 AIC/run

5. Condense grumpy-coder field-validation rules · ~1 AIC/run

Caveats

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Right-size `grumpy-coder` sub-agent from `model: large` to `model: small` · ~15 AIC/run

3. Consolidate Step 2 checklist with `## Guidelines / Review Focus` · ~4 AIC/run