Executive Summary
- 4 runs sampled (all with
conclusion: completed) from a pool of 17 eligible runs in the last 24 h
- 4 distinct workflows covered: Copilot Opt, PR Triage Agent, Failure Investigator, Daily Safe Output Integrator
- Median first-request size: 11,564 chars · P95: 19,966 chars (Copilot Opt)
- Highest-leverage finding:
copilot-opt.md inlines the full jqschema skill (3,032 chars of bash examples) and carries a 10,496-char main body—together they account for 68 % of the largest sampled prompt. Separately, pr-triage-agent.md imports a dead code-review config (901 chars) and runs an expensive frontier model on a read-only triage task.
Note on request source: no event-logs.jsonl was present in sandbox/firewall/logs/api-proxy-logs/ for any sampled run, and all user.message events in the agent session logs had empty content. All first-request sizes are therefore derived from prompt.txt (compilation artifact), cross-checked against the token-usage.jsonl first-entry input_tokens. Char/token ratios below 1.0 (PR Triage, Daily Safe Output Integrator) indicate that tool-schema JSON sent via the API adds tokens not visible in prompt.txt.
Highest-Leverage Changes
- [skills / copilot-opt] Replace the direct jqschema SKILL.md import with a lightweight reference — the verbose bash examples add 3,032 chars on every run. (High)
- [workflow-md / copilot-opt] Slim the main body (10,496 chars, 49 headings, 18 code fences); collapse example blocks and consolidate repetitive phase descriptions. (High)
- [workflow-md / pr-triage-agent] Remove the
pr-code-review-config.md import (901 chars / ~1,208 tokens): it is code-review configuration that is never used in a triage-only workflow. (Medium — safe immediately)
- [workflow-md / pr-triage-agent] Downgrade engine from
claude-sonnet-4.6 to haiku or gpt-4.1-mini; triage is read-only label/classify work with no code generation. (High — needs brief validation)
- [agents / copilot-opt] Move the 92 %-data-gathering turns to deterministic
steps: (pre-agent); this eliminates redundant context re-sends on every turn. (Medium)
Key Metrics
| Metric |
Value |
| Sampled runs |
4 |
| Distinct workflows |
4 |
| Median first-request chars |
11,564 |
| P95 first-request chars |
19,966 |
| Largest sampled request |
Copilot Opt — 19,966 chars / 18,756 tokens |
Per-Run First-Request Metrics
| Run |
Workflow |
chars |
input_tokens |
char/tok |
AIC |
turns |
Conclusion |
| §27569486853 |
Copilot Opt |
19,966 |
18,756 |
1.065 |
401.0 |
1 |
success |
| §27570973333 |
PR Triage Agent |
12,603 |
16,898 |
0.746 |
629.8 |
1 |
success |
| §27572546323 |
Failure Investigator |
10,526 |
4,473 |
2.353† |
572.8 |
38 |
success |
| §27572285541 |
Daily Safe Output Integrator |
10,004 |
16,134 |
0.620 |
323.5 |
1 |
failure |
† Failure Investigator uses the claude (Claude Code) engine; prompt.txt is significantly larger than what Claude Code sends as its first API message, so the ratio is not directly comparable.
Repeated Ambient Context Signals
- Shared 1,363-char security-policy prefix identical across all 4 runs (infrastructure-injected; not actionable per workflow author).
<safe-outputs> block (2,578 chars) repeated verbatim in all 4 runs — infrastructure-injected.
reporting.md (497 chars) imported into every run that produces reports; content is short and shared, so no savings available here.
noop-reminder.md (411 chars) duplicated in the imports of copilot-opt and pr-triage-agent alongside the infrastructure <safe-outputs> block that already covers the same requirement — possible double-instruction overlap.
pr-code-review-config.md in pr-triage-agent injects 901 chars of review-tool guidance (diff fetching, comment threading) into a workflow that only applies labels and writes one issue — entirely dead context.
- jqschema SKILL.md embeds two full worked examples with multi-step bash pipelines (3,032 chars total), the largest imported skill block observed.
<safe-output-tools> section in Daily Safe Output Integrator is 3,984 chars — 56 % larger than the same section in Copilot Opt (2,409 chars) — suggesting the workflow's safe-output configuration exposes more tools than necessary; worth auditing.
Deterministic Analysis Output
Script: /tmp/gh-aw/ambient-context/analyze_requests.py (stdlib only)
Top sections by line count (across sampled runs):
| Workflow |
Largest section |
Lines |
| Copilot Opt |
(preamble / system) |
73 |
| PR Triage Agent |
(preamble / system) |
87 |
| Failure Investigator |
(preamble / system) |
61 |
| Daily Safe Output |
(preamble / system) |
73 |
The preamble (infrastructure tags + imported shared context) dominates first-request line counts in every run; the actual workflow-specific sections are relatively compact.
Keyword line density (lines mentioning keyword / total lines):
| Workflow |
tools |
agents |
safe_outputs |
workflow |
| Copilot Opt |
30 |
17 |
13 |
14 |
| PR Triage Agent |
19 |
8 |
13 |
11 |
| Failure Investigator |
19 |
14 |
12 |
— |
| Daily Safe Output |
18 |
3 |
21 |
— |
Copilot Opt has the highest agent-keyword density, consistent with the 49 headings and multi-phase orchestration description.
Duplicate line ratios: Copilot Opt 0.048 (highest), all others 0.000. The duplicates in Copilot Opt stem from repeated bash command patterns in the jqschema skill examples.
Recommendations by Category
Workflow Markdown
-
pr-triage-agent.md — remove pr-code-review-config.md import (Medium, safe immediately)
- The
pr-review-base.md import pulls in pr-code-review-config.md (901 chars), which documents how to fetch diffs and submit code reviews. PR Triage only applies labels and creates one report issue; it never calls review tools.
- Removing this import saves ~1,208 tokens on every scheduled run (6 h cadence → ~4 runs/day).
- Also confirm whether the
pr-review-base.md import itself is needed, or if only the pr-data portion is required.
-
pr-triage-agent.md — downgrade engine model (High, needs brief validation)
- The workflow runs
claude-sonnet-4.6 for a read-only label/classify/issue-create task. The audit flagged it as a model-downgrade candidate.
- Set
engine.model: claude-haiku-4-5 (or gpt-4.1-mini) in the frontmatter. Estimated cost reduction: 8–10× per run at similar quality for structured triage.
-
copilot-opt.md — slim the main body (High, needs review)
- 10,496 chars with 49 headings and 18 code fences is the single largest workflow body in the sample. The verbose Phase 0–3 descriptions and multi-step bash examples can be condensed.
- Target: reduce to ≤6,000 chars by collapsing example code into one-liner hints, merging phases that share a single decision point, and moving static background into a separate imported stub loaded only when needed.
Skills
copilot-opt.md — replace jqschema SKILL.md inline import with a short reference (High, safe immediately)
- The import embeds 3,032 chars of usage narrative and bash examples (two full worked workflows with
echo + pipeline steps). The skill's core value is in the jqschema.sh script path; the narrative examples can be cut to a 3-line stub: path, input format, output format.
- Alternatively, load the skill only when a phase requires schema discovery (not in the system preamble).
- Removing verbose examples saves ~2,400–2,700 chars from every Copilot Opt first request.
Agents
copilot-opt.md — move data-fetching to deterministic steps: (Medium, needs design)
- The audit found 92 % of turns are data-gathering (session log reads, PR fetches, file loads). These are deterministic and can run as pre-agent
steps: writing to /tmp/gh-aw/agent/.
- The workflow already imports
shared/copilot-session-data-fetch.md and shared/copilot-pr-data-fetch.md — confirm these are being executed as pre-agent steps (not as runtime-imported LLM instructions). If they are LLM-directed at runtime, move them to deterministic steps: in the frontmatter.
- This reduces multi-turn context growth and eliminates re-sending the full prompt on each data-gathering step.
References
Generated by 🌫️ Daily Ambient Context Optimizer · 1.6K AIC · ⌖ 14.5 AIC · ⊞ 21.9K · ◷
Executive Summary
conclusion: completed) from a pool of 17 eligible runs in the last 24 hcopilot-opt.mdinlines the full jqschema skill (3,032 chars of bash examples) and carries a 10,496-char main body—together they account for 68 % of the largest sampled prompt. Separately,pr-triage-agent.mdimports a dead code-review config (901 chars) and runs an expensive frontier model on a read-only triage task.Highest-Leverage Changes
pr-code-review-config.mdimport (901 chars / ~1,208 tokens): it is code-review configuration that is never used in a triage-only workflow. (Medium — safe immediately)claude-sonnet-4.6tohaikuorgpt-4.1-mini; triage is read-only label/classify work with no code generation. (High — needs brief validation)steps:(pre-agent); this eliminates redundant context re-sends on every turn. (Medium)Key Metrics
Per-Run First-Request Metrics
† Failure Investigator uses the
claude(Claude Code) engine;prompt.txtis significantly larger than what Claude Code sends as its first API message, so the ratio is not directly comparable.Repeated Ambient Context Signals
<safe-outputs>block (2,578 chars) repeated verbatim in all 4 runs — infrastructure-injected.reporting.md(497 chars) imported into every run that produces reports; content is short and shared, so no savings available here.noop-reminder.md(411 chars) duplicated in the imports of copilot-opt and pr-triage-agent alongside the infrastructure<safe-outputs>block that already covers the same requirement — possible double-instruction overlap.pr-code-review-config.mdin pr-triage-agent injects 901 chars of review-tool guidance (diff fetching, comment threading) into a workflow that only applies labels and writes one issue — entirely dead context.<safe-output-tools>section in Daily Safe Output Integrator is 3,984 chars — 56 % larger than the same section in Copilot Opt (2,409 chars) — suggesting the workflow's safe-output configuration exposes more tools than necessary; worth auditing.Deterministic Analysis Output
Script:
/tmp/gh-aw/ambient-context/analyze_requests.py(stdlib only)Top sections by line count (across sampled runs):
(preamble / system)(preamble / system)(preamble / system)(preamble / system)The preamble (infrastructure tags + imported shared context) dominates first-request line counts in every run; the actual workflow-specific sections are relatively compact.
Keyword line density (lines mentioning keyword / total lines):
Copilot Opt has the highest agent-keyword density, consistent with the 49 headings and multi-phase orchestration description.
Duplicate line ratios: Copilot Opt 0.048 (highest), all others 0.000. The duplicates in Copilot Opt stem from repeated bash command patterns in the jqschema skill examples.
Recommendations by Category
Workflow Markdown
pr-triage-agent.md— removepr-code-review-config.mdimport (Medium, safe immediately)pr-review-base.mdimport pulls inpr-code-review-config.md(901 chars), which documents how to fetch diffs and submit code reviews. PR Triage only applies labels and creates one report issue; it never calls review tools.pr-review-base.mdimport itself is needed, or if only thepr-dataportion is required.pr-triage-agent.md— downgrade engine model (High, needs brief validation)claude-sonnet-4.6for a read-only label/classify/issue-create task. The audit flagged it as a model-downgrade candidate.engine.model: claude-haiku-4-5(orgpt-4.1-mini) in the frontmatter. Estimated cost reduction: 8–10× per run at similar quality for structured triage.copilot-opt.md— slim the main body (High, needs review)Skills
copilot-opt.md— replace jqschema SKILL.md inline import with a short reference (High, safe immediately)echo+ pipeline steps). The skill's core value is in thejqschema.shscript path; the narrative examples can be cut to a 3-line stub: path, input format, output format.Agents
copilot-opt.md— move data-fetching to deterministicsteps:(Medium, needs design)steps:writing to/tmp/gh-aw/agent/.shared/copilot-session-data-fetch.mdandshared/copilot-pr-data-fetch.md— confirm these are being executed as pre-agent steps (not as runtime-imported LLM instructions). If they are LLM-directed at runtime, move them to deterministicsteps:in the frontmatter.References