Skip to content

[codex] add skill context budget guardrails#1264

Open
mzhaom wants to merge 1 commit intogarrytan:mainfrom
mzhaom:chore/skill-context-budget-plan
Open

[codex] add skill context budget guardrails#1264
mzhaom wants to merge 1 commit intogarrytan:mainfrom
mzhaom:chore/skill-context-budget-plan

Conversation

@mzhaom
Copy link
Copy Markdown

@mzhaom mzhaom commented Apr 28, 2026

Summary

Adds the first implementation slice from docs/designs/SKILL_CONTEXT_BUDGET.md:

  • adds scripts/skill-context-budget.ts with --report and --check modes
  • wires bun run skill:budget and bun run skill:budget:check
  • includes context-budget output in bun run skill:check
  • slims generated skill frontmatter descriptions and keeps routing phrases in triggers: metadata
  • adds budget regression tests for visible description and eager catalog size
  • regenerates the generated SKILL.md files from the updated templates

Why this helps Codex context windows

Codex context is not spent only on the user's prompt and the repo files it reads. A real coding turn also carries system/developer instructions, tool schemas, AGENTS.md guidance, conversation history, and any skill catalog text exposed for routing. That means eager skill discovery text competes directly with source files, diffs, test output, and review context before Codex has started the task.

Before this PR, visible generated skill frontmatter descriptions alone were about 20,919 chars (~5,230 approximate tokens), and the eager catalog estimate was about 22,791 chars (~5,698 approximate tokens). The full visible generated SKILL.md bodies are about 570k approximate tokens, so blurring discovery and execution can dominate even large context windows. There is already a concrete Codex symptom in test/skill-e2e-workflow.test.ts: the Codex workflow test slices out only the review-relevant section because reading the full codex/SKILL.md is too large for the turn.

This PR separates the cheap routing layer from the expensive execution layer. After the description slimming, visible descriptions are 6,491 chars (~1,623 approximate tokens), and the eager catalog estimate is 8,365 chars (~2,092 approximate tokens). Codex keeps more window available for the actual task while selected skills still load their full workflow text when needed.

Impact

This reduces eager discovery context while preserving skill execution bodies. Current visible metrics from bun run skill:budget:check:

  • visible generated skills: 47
  • visible generated bytes: 2.18 MB, about 570,605 approximate tokens
  • visible description chars: 6,491, down from about 20,919
  • eager catalog estimate: 8,365 chars, under the 11,000 target

The budget check is intentionally conservative for this PR: hard failures are limited to parser errors, generated skills over the 160 KB ceiling, and changed templates with descriptions over 360 chars. Existing oversized skill bodies remain warnings.

Follow-up Work

  1. Shared preamble slim: move expanded voice, writing style, context recovery, search-before-building, and completeness guidance into lazy-loaded references/preamble/*.md; ratchet tier >= 2 preamble warnings toward 22 KB.
  2. Split codex first: extract review, challenge, consult, and session-continuity modes into codex/references/*.md; update the E2E workflow test so it reads the full generated codex/SKILL.md instead of slicing a section.
  3. Ratchet budgets after the first mega-skill split: lower body warning targets and make the 360-char description hard limit apply to all templates, not only changed ones.
  4. Repeat the split pattern for review, ship, the plan-review family, qa, and design-review.
  5. Tighten host output hygiene so generated hidden host skill directories remain clearly non-discoverable outside their intended host install paths.

Validation

  • bun install
  • bun run gen:skill-docs
  • bun run skill:budget:check
  • bun run skill:check
  • bun test test/skill-context-budget.test.ts
  • bun test test/gen-skill-docs.test.ts
  • bun test test/skill-context-budget.test.ts test/gen-skill-docs.test.ts test/skill-validation.test.ts (693 pass, 0 fail)

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

@mzhaom mzhaom marked this pull request as ready for review April 28, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant