Skip to content

feat: add /hyper-plan — recursive codebase improvement with convergence scoring#144

Closed
ShaheerKhawaja wants to merge 1 commit into
garrytan:mainfrom
ShaheerKhawaja:feat/hyper-plan-mode
Closed

feat: add /hyper-plan — recursive codebase improvement with convergence scoring#144
ShaheerKhawaja wants to merge 1 commit into
garrytan:mainfrom
ShaheerKhawaja:feat/hyper-plan-mode

Conversation

@ShaheerKhawaja

Copy link
Copy Markdown

What this does

Adds /hyper-plan — a new skill that chains /plan-ceo-review/plan-eng-review → execute fixes → /qa into an iterative loop with LLM-as-Judge convergence control.

The problem

Individual /plan-ceo-review and /plan-eng-review passes find issues but don't close the loop. After a review, you manually decide what to fix, fix it, and hope you didn't break something else. There's no convergence criteria, no regression detection, and findings aren't tracked across iterations.

How /hyper-plan solves it

It treats codebase quality like gradient descent — each iteration moves toward a target grade, focused tighter each round on the weakest dimensions.

Iteration 1 runs full CEO + Engineering review, compiles findings into P0-P3, executes P0/P1 fixes with parallel agents, runs /qa diff-aware to verify, then scores 10 quality dimensions (1-10 each):

Code Quality Security Performance UX/UI Tests
Accessibility Documentation Error Handling Observability Deploy Safety

Iterations 2-7 focus on ONLY the 2 lowest-scoring dimensions from the previous round. This focus narrowing is how convergence happens — reviewing everything every round causes thrashing.

Convergence rules:

  • SUCCESS: overall grade ≥ target (default 8.0/10)
  • CONVERGED: improvement < 0.2 for 2 consecutive iterations (plateaued)
  • DEGRADED: any dimension decreased → HALT (something went wrong)
  • MAX_REACHED: 7 iterations completed

Key design decisions:

  • Judge reads actual source code with file:line evidence — doesn't trust fix-agent self-reports
  • Every fix batch passes a validation gate (lint + types + tests) before committing
  • Follows the Completeness Principle — when fixing, fix completely
  • All artifacts saved to .hyper-plan/ for auditability
  • Orchestrates existing skills — doesn't replace /plan-ceo-review, /plan-eng-review, or /qa

Example convergence

| Iteration | Grade | Delta | Focus           | Verdict  |
|-----------|-------|-------|-----------------|----------|
| Baseline  | 5.4   | —     | All             | —        |
| 1         | 6.2   | +0.8  | All             | CONTINUE |
| 2         | 6.8   | +0.6  | Tests, Security | CONTINUE |
| 3         | 7.2   | +0.4  | Perf, Deploy    | CONTINUE |
| 4         | 7.5   | +0.3  | UX, Docs        | CONTINUE |
| 5         | 7.8   | +0.3  | Errors, Observ. | CONTINUE |
| 6         | 8.1   | +0.3  | —               | SUCCESS  |

Files changed

  • hyper-plan/SKILL.md — new skill (154 lines)

Test plan

  • Run /hyper-plan on a sample project, verify it chains the 3 review skills
  • Verify convergence stops when grade ≥ target
  • Verify HALT triggers when any dimension score decreases
  • Verify .hyper-plan/ output files are created after each iteration
  • Verify focused iterations only review the 2 flagged dimensions

…ce scoring

Adds a new skill that chains /plan-ceo-review → /plan-eng-review → execute fixes → /qa
into an iterative loop with LLM-as-Judge convergence control.

The problem: individual review passes find issues but don't close the loop. Findings
aren't tracked across iterations, there's no convergence criteria, and fixed code doesn't
get re-reviewed. /hyper-plan treats quality like gradient descent — each iteration moves
toward a target grade, focused tighter each round on the weakest dimensions.

How it works:
- Iteration 1 runs full CEO + Eng review, fixes P0/P1 findings, runs QA, then scores
  10 quality dimensions (Code Quality, Security, Performance, UX, Tests, A11y, Docs,
  Error Handling, Observability, Deploy Safety)
- Iterations 2-7 focus on only the 2 lowest-scoring dimensions from the previous round
- Stops on: SUCCESS (grade >= target), CONVERGED (delta < 0.2 twice), DEGRADED (any
  dimension decreased), or MAX_REACHED (7 iterations)
- Judge reads actual source code with file:line evidence — no self-reporting from fixers
- Every fix batch passes a validation gate (lint + types + tests) before committing
- All artifacts saved to .hyper-plan/ for auditability

Follows the Completeness Principle — when fixing an issue, fix it completely.
@garrytan

Copy link
Copy Markdown
Owner

Single raw SKILL.md, no .tmpl template. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants