feat: /challenge skill — Polya 4-stage plan stress-test by amargandhi · Pull Request #1231 · garrytan/gstack

amargandhi · 2026-04-26T21:32:19Z

Summary

Adds /challenge — a pure-analysis plan stress-test grounded in Polya 1945 How to Solve It. Walks a plan through 4 stages (Understand → Devise → Carry out → Look back) with 3-5 hard adversarial questions per stage. Each question includes the agent's recommended answer and a P1/P2/P3 priority. Verdict at top: READY / OPEN QUESTIONS / CRITICAL GAPS.

Why a new skill

Plans that land in /ship unchallenged tend to ship bugs that a 5-minute stress-test would have caught. The failure mode isn't stupidity — it's that the author is too close to the plan to see what's missing. /plan-eng-review covers architecture mechanics; /plan-ceo-review covers scope. Neither walks Polya's stages methodically — unstated assumptions, missing rollback, ambiguous acceptance criteria, "what if it takes 3x longer." /challenge fills that gap.

Complement to /codex challenge: /codex challenge gets a cross-model adversarial opinion. /challenge is methodological + deterministic — same skill, same model, same questions. Use both for high-stakes plans.

Hard gate

Pure analysis. No plan edits, no code, no implementation. Output is a report at ~/.gstack/challenges/<date>-<slug>.md. If the user wants to apply fixes after, that's a separate /plan-ceo-review, /plan-eng-review, or direct edit invocation. Documented prominently in the template.

What this PR adds

challenge/SKILL.md.tmpl (354 lines) — defines the 4-stage walk with question banks per stage (Q1.1-Q4.5), HARD GATE prose, output format, and ranking heuristics.
scripts/resolvers/preamble/generate-routing-injection.ts — adds one bullet so Claude/Codex auto-route phrases like "stress-test", "poke holes", "what could go wrong", "red-team" to /challenge.
Generated SKILL.md across all 10 hosts (Claude, Codex, OpenCode, Cursor, Factory, Slate, Kiro, Hermes, GBrain, OpenClaw) — ~12K tokens per generated skill, well under the 40K ceiling.

Arguments

--scope <stage> — focus on one Polya stage instead of all 4
--dry-run — show questions without saving the report

Test plan

bun test test/skill-validation.test.ts test/gen-skill-docs.test.ts — 689/689 pass
bun run gen:skill-docs --host all — clean regen, no warnings, no token-ceiling alerts
Built against upstream/main at v1.15.0.0
Plan-mode E2E coverage via the new runPlanSkillObservation() harness — out of scope for this PR; happy to add as a follow-up if requested

Notes for review

This is one of seven canonical-literature-grounded skills on my fork. The others (/glossary — Evans DDD, /cso --stability — Nygard patterns, /investigate --file-issue, Fowler's 24 code smells in /review, etc.) are larger and I'm holding them for separate PRs based on your appetite.
Voice and style follow gstack conventions: short sentences, named sources, no AI vocabulary, no em dashes in the new prose. I've kept the template style consistent with /plan-eng-review and /plan-ceo-review.

Pure-analysis plan stress-test grounded in Polya 1945 *How to Solve It*. Stress-tests a plan across 4 stages (Understand → Devise → Carry out → Look back) with 3-5 hard adversarial questions per stage. Each question includes the agent's recommended answer and a P1/P2/P3 priority. Verdict at top: READY / OPEN QUESTIONS / CRITICAL GAPS. **Why a new skill:** Plans that land in /ship unchallenged tend to ship bugs that a five-minute stress-test would have caught. The author is too close to the plan to see what's missing. /plan-eng-review covers architecture mechanics; /plan-ceo-review covers scope. Neither walks Polya's stages methodically — unstated assumptions, missing rollback, ambiguous acceptance criteria, what-if-it-takes-3x-longer. /challenge fills that gap. **Hard gate:** Pure analysis. No plan edits, no code, no implementation. Output is a report at ~/.gstack/challenges/<date>-<slug>.md. If the user wants to apply fixes after, that's a separate /plan-ceo-review, /plan-eng-review, or direct edit invocation. **Arguments:** - `--scope <stage>` — focus on one Polya stage instead of all 4 - `--dry-run` — show questions without saving the report **Where it fits:** Run before /ship when reversibility matters (production database changes, public API changes, anything you can't easily undo). Complements /codex challenge (which gets a second-model opinion) — /challenge is methodological + deterministic; /codex is cross-model + non-deterministic. What this PR adds: 1. `challenge/SKILL.md.tmpl` (354 lines) — the skill template. Defines the 4-stage walk with question banks per stage (Q1.1-Q4.5), HARD GATE prose, output format, and ranking heuristics. 2. `scripts/resolvers/preamble/generate-routing-injection.ts` — adds one bullet so Claude/Codex auto-route "stress-test", "poke holes", "what could go wrong", "red-team" prompts to /challenge. 3. Generated SKILL.md across all 10 hosts (Claude, Codex, OpenCode, Cursor, Factory, Slate, Kiro, Hermes, GBrain, OpenClaw) — ~12K tokens per skill, well under the 40K ceiling. Tests: 689/689 pass on `bun test test/skill-validation.test.ts test/gen-skill-docs.test.ts`. Built and tested against upstream/main at v1.15.0.0.

amargandhi mentioned this pull request Apr 26, 2026

feat: /glossary skill — Evans DDD ubiquitous language + bounded contexts #1232

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: /challenge skill — Polya 4-stage plan stress-test#1231

feat: /challenge skill — Polya 4-stage plan stress-test#1231
amargandhi wants to merge 1 commit intogarrytan:mainfrom
amargandhi:pr/challenge-skill

amargandhi commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amargandhi commented Apr 26, 2026

Summary

Why a new skill

Hard gate

What this PR adds

Arguments

Test plan

Notes for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant