feat(workflows): add --gate-script for non-interactive gate testing by doquanghuy · Pull Request #2667 · github/spec-kit

doquanghuy · 2026-05-21T16:33:47Z

Description

Closes #2594.

Adds a documented contract for scripting gate verdicts during
specify workflow run, so CI and test runners can drive workflows
through gated paths without operator interaction. Implements
shape A from the issue (--gate-script flag with a typed
speckit.gate-script/v1 schema).

Note for maintainers: the issue invited a polite close if
gate behaviour is intentionally manual-only. Submitting the PR
so reviewers can see the concrete shape — happy to close
cleanly if that's the call.

Why

Spec Kit's gate step prompts interactively for a verdict.
Without a documented way to script those verdicts:

Spec Kit's own CI cannot exercise gate behaviour end-to-end —
changes to gate prompt format, output channel, or switch
routing on verdict can regress without coverage.
Workflows authored on Spec Kit that depend on gate semantics
cannot be CI-tested without hand-rolling a pexpect driver
against an undocumented prompt format.

Canonical usage

specify workflow run my-pipeline --gate-script verdicts.yaml

# verdicts.yaml
schema: speckit.gate-script/v1
verdicts:
  - gate_id: review-overview
    iteration: 0
    verdict: improve
  - gate_id: review-overview
    iteration: 1
    verdict: approve
  - gate_id: review-final
    iteration: 0
    verdict: approve

Behaviour

gate_id matches the workflow YAML step id (the
author-visible base id — engine-internal loop namespacing like
parent:child:N is unwrapped automatically via
extract_base_gate_id).
iteration selects which firing the verdict applies to
(0-indexed, counting from the first time the gate runs within
the current run; per-gate counter held on StepContext).
verdict is the option string the gate would otherwise
produce — typically approve / reject / edit / a custom
route name.
When no entry matches, the gate falls back to its normal
behaviour (interactive prompt on TTY, PAUSED otherwise) —
workflows can partially-script only the gates they care about.
output.scripted records True for scripted verdicts and
False for interactive ones, so downstream steps can
distinguish them.
Scripted reject verdicts honour the gate's on_reject
setting identically to operator-driven rejects (abort →
ABORTED, skip → COMPLETED, retry → PAUSED).

Implementation

src/specify_cli/workflows/gate_script.py — new module with
load_gate_script(path), parse_gate_script(data),
lookup_scripted_verdict(script, base_id, iteration), and
extract_base_gate_id(step_id) helpers. The schema and
validator are in one place so the CLI loader, the engine
consultation path, and tests share one contract.
StepContext gains two fields: gate_script (parsed list) and
gate_firing_counts (per-base-gate-id counter). Both default
to empty so existing constructions are unaffected.
WorkflowEngine.execute(...) and WorkflowEngine.resume(...)
gain an optional gate_script parameter.
GateStep.execute consults the script at the start: it
increments the firing counter for the current gate's base id,
looks up the scripted verdict, and if found returns it directly
(honouring on_reject for scripted rejects). When no entry
matches, behaviour is byte-identical to before this change.
specify workflow run gains a --gate-script PATH option that
loads + validates the script before passing it to the engine.
Loader errors are fatal — the operator asked for non-interactive
run, so silently falling back to prompts on a broken script
would be worse than failing fast.

Default behaviour preserved

Workflows that don't pass --gate-script see no difference. The
only new output field is output.scripted: False for unscripted
gates — additive, no existing field removed or renamed. Existing
gate tests use field-specific assertions rather than full output
shape, so nothing breaks.

Not in scope here

Shape B (documented prompt-format regex for pexpect drivers)
from the issue. Shape A is more CI-friendly; B can be added
later as a separate PR if there's demand.
Persisting the firing counter across resume. The counter
lives on StepContext only, not RunState. For scripted CI
runs this is fine (the run completes in one shot). Adding it to
RunState is a follow-on if scripted gates need to survive
pause-resume cycles.

Testing

Tested locally with uv run specify --help — the new
--gate-script flag appears under workflow run.
Ran existing tests with uv sync && uv run pytest
→ 2980 passed, 35 skipped (was 2960 before; +20 new
tests added in this PR).
Tested with a sample workflow: ran a 2-gate workflow with a
verdicts.yaml driving both verdicts to approve. Workflow
completed end-to-end with zero operator input; state.step_results
showed output.scripted = True on both gates. Re-ran the
same workflow without --gate-script — first gate PAUSED
as expected (non-TTY).

New test coverage

TestGateScriptLoader (13 unit tests):

Test	What it locks
`test_parse_valid_script_returns_verdicts`	Happy path.
`test_parse_rejects_wrong_schema`	Schema gating.
`test_parse_rejects_missing_verdicts`	Required top-level field.
`test_parse_rejects_non_mapping_verdict`	Per-entry shape.
`test_parse_rejects_missing_required_fields`	Field completeness.
`test_parse_rejects_non_int_iteration`	Type strictness.
`test_parse_rejects_bool_iteration`	`bool`-is-a-subclass-of-`int` trap.
`test_lookup_finds_matching_entry`	Lookup semantics.
`test_lookup_returns_none_for_no_match`	Missing-entry contract.
`test_lookup_empty_script_returns_none`	Empty-script safety.
`test_extract_base_gate_id_unwrapped`	Identity when not namespaced.
`test_extract_base_gate_id_unwraps_loop_namespacing`	Inverts engine's `parent:child:N` namespacing.
`test_load_from_file` / `test_load_missing_file_raises`	File-loader path.

TestGateScriptIntegration (5 end-to-end tests):

Test	What it locks
`test_gate_uses_scripted_verdict_instead_of_prompting`	Core contract — script short-circuits prompt, `output.scripted = True`.
`test_improve_then_approve_cycle_via_switch_unroll`	Canonical CI scenario — two gates driven entirely by script. Decoupled from the still-open loop-namespacing bug (#2592) by using a switch-unrolled cycle rather than a `do-while`.
`test_iteration_counter_increments_on_repeated_gate_id`	Per-gate counter is independent across multiple gates.
`test_default_behaviour_preserved_without_script`	Byte-equivalent default — workflows without the flag PAUSE on gates in non-TTY exactly as before.
`test_non_matching_script_entry_falls_back_to_prompt`	Partial-script contract — gates without an entry fall back normally.
`test_scripted_reject_with_abort_halts_run`	Scripted `reject` + `on_reject: abort` → `ABORTED`, same as operator-driven.

AI Disclosure

I did not use AI assistance for this contribution
I did use AI assistance (described below)

Used Claude Opus to draft the loader module, the engine wiring,
the test suite, the docs section, and this PR body. The proposed
shape (speckit.gate-script/v1 + --gate-script flag) was
specified in issue #2594; this PR implements that proposal. Code,
tests, and design decisions were human-reviewed before
submission.

Update (post-open audit)

Two improvements after a self-review pass:

`--gate-script` now also accepts on `workflow resume`

The engine API already supported it
(WorkflowEngine.resume(gate_script=...)) but the CLI didn't
expose the flag — a real consistency gap. specify workflow resume <run_id> --gate-script verdicts.yaml now works the same way as
workflow run --gate-script, so a scripted CI run can also drive
gates that fire only after a prior pause.

specify workflow run my-pipeline --gate-script verdicts.yaml
specify workflow resume <run_id> --gate-script verdicts.yaml

Validator rejects duplicate `(gate_id, iteration)` pairs

lookup_scripted_verdict returns the first match, so two entries
sharing the same (gate_id, iteration) would silently shadow each
other — almost always a copy-paste authoring mistake. The parser
now fails fast at load time. Two new tests pin this:

test_parse_rejects_duplicate_gate_id_iteration
test_parse_allows_same_gate_id_different_iterations (positive
case — distinct iterations remain valid, which is how the
improve→approve pattern works)

Suite is now 2982 passed (was 2960 baseline; +22 new tests
covering this PR).

Closes github#2594. Adds a documented contract for scripting gate verdicts during `specify workflow run`, so CI and test runners can drive workflows through gated paths without operator interaction. ### Why Spec Kit's `gate` step prompts interactively for a verdict. Without a documented way to script those verdicts: - Spec Kit's own CI cannot exercise gate behaviour end-to-end — changes to gate prompt format, output channel, or switch routing on verdict can regress without coverage. - Workflows authored on Spec Kit that depend on gate semantics cannot be CI-tested without hand-rolling a `pexpect` driver against an undocumented prompt format. ### How it works This PR implements shape A from issue github#2594 — `--gate-script` flag. ```bash specify workflow run my-pipeline --gate-script verdicts.yaml ``` ```yaml # verdicts.yaml schema: speckit.gate-script/v1 verdicts: - gate_id: review-overview iteration: 0 verdict: improve - gate_id: review-overview iteration: 1 verdict: approve ``` Behaviour: - `gate_id` matches the workflow YAML step `id` (engine-internal loop namespacing like `parent:child:N` is unwrapped automatically via `extract_base_gate_id`). - `iteration` selects which firing the verdict applies to (0-indexed, counting from the first time the gate runs within the current run; per-gate counter held on `StepContext`). - `verdict` is the option string the gate would otherwise produce. - When the engine finds no matching entry, the gate falls back to its normal behaviour (interactive prompt on TTY, `PAUSED` otherwise) — workflows can partially-script only the gates they care about. - `output.scripted` records `True` for scripted verdicts so workflows can distinguish them downstream. - Scripted `reject` verdicts honour the gate's `on_reject` setting identically to operator-driven rejects. ### Implementation - `src/specify_cli/workflows/gate_script.py` — new module with `load_gate_script(path)`, `parse_gate_script(data)`, `lookup_scripted_verdict(script, base_id, iteration)`, and `extract_base_gate_id(step_id)` helpers. The schema (`speckit.gate-script/v1`) and validator are in one place so the CLI loader, the engine consultation path, and tests share one contract. - `StepContext` gains two fields: `gate_script` (the parsed list) and `gate_firing_counts` (per-base-gate-id counter). Both default to empty so existing constructions are unaffected. - `WorkflowEngine.execute(...)` and `WorkflowEngine.resume(...)` gain an optional `gate_script` parameter that flows into the StepContext. - `GateStep.execute` consults the script at the start: it increments the firing counter for the current gate's base id, looks up the scripted verdict, and if found returns it directly (honouring `on_reject` for scripted rejects). If no entry matches, behaviour is byte-identical to before this change. - `specify workflow run` gains a `--gate-script PATH` option that loads + validates the script before passing it to the engine. Loader errors are fatal — the operator asked for non-interactive run, so silently falling back to prompts on a broken script would be worse than failing fast. ### Default behaviour preserved Workflows that don't pass `--gate-script` see no difference. The only new output field is `output.scripted: False` for unscripted gates — additive, no existing field removed or renamed. Existing gate tests assert specific fields rather than full output shape so nothing breaks. ### Tests `TestGateScriptLoader` (13 unit tests): schema validation, missing fields, type checks (including the `bool`-is-a-subclass-of-`int` trap), file loading, base-gate-id extraction (with and without loop namespacing). `TestGateScriptIntegration` (5 end-to-end tests): | Test | What it locks | |---|---| | `test_gate_uses_scripted_verdict_instead_of_prompting` | Core contract — scripted verdict short-circuits the prompt, `output.scripted = True`. | | `test_improve_then_approve_cycle_via_switch_unroll` | The canonical CI scenario from the issue — two gates driven entirely by script. Decoupled from the still-open loop-namespacing bug (github#2592) by using a switch-unrolled cycle. | | `test_iteration_counter_increments_on_repeated_gate_id` | Per-gate counter is independent across multiple gates. | | `test_default_behaviour_preserved_without_script` | Locks the byte-equivalent default — workflows without `--gate-script` PAUSE on gates in non-TTY exactly as before. | | `test_non_matching_script_entry_falls_back_to_prompt` | Partial-script contract — gates without an entry fall back normally. | | `test_scripted_reject_with_abort_halts_run` | Scripted `reject` + `on_reject: abort` → `ABORTED`, same as operator-driven. | ### Docs `workflows/README.md` gains a "Non-Interactive Gate Testing" section documenting the CLI flag, the schema, the matching rules, and the `output.scripted` distinguisher. ### Not in scope here - **Shape B** (documented prompt-format regex for pexpect drivers) from the issue. Shape A is more CI-friendly; B can be added later as a separate PR if there's demand. - **Persisting the firing counter across resume.** The counter lives on `StepContext` only, not `RunState`. For scripted CI runs this is fine (the run completes in one shot). Adding it to `RunState` is a follow-on if scripted gates need to survive pause-resume cycles.

doquanghuy · 2026-05-22T02:42:20Z

Closing per @mnriem's feedback on the parent issue (#2594): the existing type: if step + a ci_mode boolean input cleanly covers the CI / non-interactive use case without a new CLI flag or schema. The new surface this PR introduces (--gate-script, speckit.gate-script/v1, StepContext.gate_script, output.scripted) isn't justified once the existing-primitives pattern is in view.

Branch retained on the fork for reference; no further work planned here.

Thanks for the consideration.

doquanghuy requested a review from mnriem as a code owner May 21, 2026 16:33

doquanghuy force-pushed the feat/gate-script branch from a980576 to 5172166 Compare May 21, 2026 16:41

doquanghuy mentioned this pull request May 22, 2026

[Feature]: Document a test harness for scripting gate verdicts non-interactively #2594

Closed

3 tasks

doquanghuy closed this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflows): add --gate-script for non-interactive gate testing#2667

feat(workflows): add --gate-script for non-interactive gate testing#2667
doquanghuy wants to merge 1 commit into
github:mainfrom
doquanghuy:feat/gate-script

doquanghuy commented May 21, 2026 •

edited

Loading

Uh oh!

doquanghuy commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doquanghuy commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why

Canonical usage

Behaviour

Implementation

Default behaviour preserved

Not in scope here

Testing

New test coverage

AI Disclosure

Update (post-open audit)

--gate-script now also accepts on workflow resume

Validator rejects duplicate (gate_id, iteration) pairs

Uh oh!

doquanghuy commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

doquanghuy commented May 21, 2026 •

edited

Loading

`--gate-script` now also accepts on `workflow resume`

Validator rejects duplicate `(gate_id, iteration)` pairs