Skip to content

[agentic-token-optimizer] Optimize PR Code Quality Reviewer: sub-agent model right-sizing + prompt verbosity reduction #40812

Description

@github-actions

Analysis Period: 2026-06-15 → 2026-06-22 · Runs Analyzed: 21 (2026-06-18: 13 runs, 2026-06-19: 4 runs, 2026-06-22: 4 runs)

Target Workflow

PR Code Quality Reviewer (pr-code-quality-reviewer.md)

Selected as the highest-volume, never-previously-optimized workflow in the 7-day window. Triggers on every pull_request: ready_for_review event and /review slash command — 13+ runs on peak days. Average AIC of 70.24/run (from 2026-06-19 data) and 180+ action-minutes on high-volume days make it the top cost target.

Cost Profile

Metric Value
Total AIC analyzed (4 runs, 2026-06-19) 280.94
Avg AIC / run 70.24
Total action-minutes (21 runs) ~278 min
Avg action-minutes / run ~13 min
Avg agent-phase duration 9.5 min (range: 4.4–15.8 min)
Raw tokens Not available in audit data
Avg turns / run Not available in audit data
Cache efficiency Not available
Conclusions (21 runs) 100% success, 0 errors

Wide agent-phase variance (4.4–15.8 min) is expected — PR size drives it. The sub-agent invocation (grumpy-coder) contributes a full LLM round-trip on every run.

Ranked Recommendations

1. Right-size grumpy-coder sub-agent from model: large to model: small · ~15 AIC/run

Action: In the ## agent: \grumpy-coder`block at the bottom ofpr-code-quality-reviewer.md`, change:

model: large

to:

model: small

Rationale: The grumpy-coder task is strictly extractive and classificatory:

  • Input: PR diff + changed-file list
  • Output: JSONL with a fixed 6-field schema (path, line, severity, headline, impact, fix)
  • Severity is a 4-value enum (critical, high, medium, low)

This matches the established pattern for model: small sub-agents in this repo — aw-failure-investigator, api-consumption-report, dead-code-remover all use model: small for similar extractive sub-tasks.

Risk mitigation built in: The main agent already treats grumpy-coder output as "advisory (not authoritative)" and performs its own independent second pass. If the smaller model produces lower-quality findings, the main agent's pass compensates. The existing field-validation fix-up rules (Coerce line to int, Drop findings with invalid path/severity) provide additional resilience.

Evidence: 21/21 observed runs succeeded. The sub-agent is invoked once per run and its output is never the sole source of review comments.

2. Remove inline Go code example from Step 4 · ~4 AIC/run

Action: In Step 4 "Write Review Comments", remove the 15-line inline Go code example block and replace with a compact 2-line format spec.

Current (approximately 25 lines including surrounding text):

Example:
```markdown
**Potential nil dereference**: `user.Profile` is accessed without a nil check...
<details><summary>💡 Suggested fix</summary>
```go
if user.Profile == nil {
    return ErrNoProfile
}

Callers that pass users without profiles...

``` ```

Proposed replacement (~4 lines):

Format each comment as: one-sentence issue + impact (always visible), followed by detailed explanation and code fix in a `<details><summary>💡 ...</summary>` block.

Rationale: The Review Formatting guideline already specifies the <details> structure. The Go code snippet is illustrative boilerplate that adds prompt tokens without adding information the model doesn't already have. All 10 sampled runs produced correctly-structured review comments.

3. Consolidate Step 2 checklist with ## Guidelines / Review Focus · ~4 AIC/run

Action: Remove the "Review Focus" subsection from ## Guidelines and fold its 3 unique directives into Step 2 and the "Do not flag" list. The 6-bullet checklist in Step 2 ("Logic errors...", "Performance issues...", etc.) already covers the same ground.

Unique content to preserve:

  • "Respect time — complete within the 15-minute timeout" → add as a note at the end of Step 5
  • "Avoid friendliness padding" → already captured in Step 4 "Tone" section

Rationale: The Review Focus section (6 bullets) overlaps almost entirely with: (a) the 8-item second-pass checklist in Step 2, (b) the "Tone" paragraph in Step 4, and (c) the "Do not flag" list. Removing the duplicate reduces prompt tokens with no functional loss.

4. Add explicit cache-file existence guard in Step 1 · ~2 AIC/run

Action: Change the cache-file read in Step 1 from:

- (Optional) `/tmp/gh-aw/cache-memory/pr-${{ ... }}.json` for past review themes

to:

- `/tmp/gh-aw/cache-memory/pr-${{ ... }}.json` — only if it exists (use `[ -f <path> ]` before reading; skip on first-review runs)

Rationale: Most runs are first-time reviews on new PRs — the cache file won't exist. Without an explicit guard, the agent attempts the read, receives a "file not found" error, and handles it as a tool-call failure. At 13+ runs/day, this is a wasted tool call on the majority of invocations. Clarifying the guard eliminates the wasted round-trip.

5. Condense grumpy-coder field-validation rules · ~1 AIC/run

Action: In the grumpy-coder sub-agent prompt, replace the 4-line validation block:

If any field is malformed, fix it before returning:
- Coerce `line` to an integer.
- Drop findings with invalid `path` or invalid `severity`.
- Truncate overly long text fields to concise summaries.

with a 1-line equivalent:

Fix malformed output before returning: coerce `line` to int, drop findings with invalid `path` or `severity`, truncate long text fields.

Rationale: Minor token reduction (3 lines → 1). No functional change — the validation rules are preserved.

Caveats

  • AIC data coverage: Only 4 of 21 audited runs have AIC measurements (2026-06-19 snapshot only). The 70.24 avg AIC/run is an estimate from a single day; actual average may differ.
  • grumpy-coder model change: Validate on 5–10 runs before full rollout. The main agent's second pass provides a backstop, but initial testing should confirm that model: small output quality is adequate for the advisory role.
  • Agent-phase variance: The 4.4–15.8 min range suggests PR size is the dominant cost driver. Token savings from prompt trimming will be proportionally larger on small PRs.
  • Shared imports: shared/pr-review-base.md and shared/pr-code-review-config.md add overlapping "Review Guidelines" content. These are out of scope for this issue (shared across multiple workflows), but they represent additional consolidation opportunity in a separate effort.

Total estimated savings: ~26 AIC/run · At 10–13 runs/day → 260–338 AIC/day

References:

Generated by Agentic Workflow AIC Usage Optimizer · 148 AIC · ⊞ 7.1K ·

  • expires on Jun 29, 2026, 8:35 AM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions