feat(workflows): add --gate-script for non-interactive gate testing#2667
Closed
doquanghuy wants to merge 1 commit into
Closed
feat(workflows): add --gate-script for non-interactive gate testing#2667doquanghuy wants to merge 1 commit into
doquanghuy wants to merge 1 commit into
Conversation
Closes github#2594. Adds a documented contract for scripting gate verdicts during `specify workflow run`, so CI and test runners can drive workflows through gated paths without operator interaction. ### Why Spec Kit's `gate` step prompts interactively for a verdict. Without a documented way to script those verdicts: - Spec Kit's own CI cannot exercise gate behaviour end-to-end — changes to gate prompt format, output channel, or switch routing on verdict can regress without coverage. - Workflows authored on Spec Kit that depend on gate semantics cannot be CI-tested without hand-rolling a `pexpect` driver against an undocumented prompt format. ### How it works This PR implements shape A from issue github#2594 — `--gate-script` flag. ```bash specify workflow run my-pipeline --gate-script verdicts.yaml ``` ```yaml # verdicts.yaml schema: speckit.gate-script/v1 verdicts: - gate_id: review-overview iteration: 0 verdict: improve - gate_id: review-overview iteration: 1 verdict: approve ``` Behaviour: - `gate_id` matches the workflow YAML step `id` (engine-internal loop namespacing like `parent:child:N` is unwrapped automatically via `extract_base_gate_id`). - `iteration` selects which firing the verdict applies to (0-indexed, counting from the first time the gate runs within the current run; per-gate counter held on `StepContext`). - `verdict` is the option string the gate would otherwise produce. - When the engine finds no matching entry, the gate falls back to its normal behaviour (interactive prompt on TTY, `PAUSED` otherwise) — workflows can partially-script only the gates they care about. - `output.scripted` records `True` for scripted verdicts so workflows can distinguish them downstream. - Scripted `reject` verdicts honour the gate's `on_reject` setting identically to operator-driven rejects. ### Implementation - `src/specify_cli/workflows/gate_script.py` — new module with `load_gate_script(path)`, `parse_gate_script(data)`, `lookup_scripted_verdict(script, base_id, iteration)`, and `extract_base_gate_id(step_id)` helpers. The schema (`speckit.gate-script/v1`) and validator are in one place so the CLI loader, the engine consultation path, and tests share one contract. - `StepContext` gains two fields: `gate_script` (the parsed list) and `gate_firing_counts` (per-base-gate-id counter). Both default to empty so existing constructions are unaffected. - `WorkflowEngine.execute(...)` and `WorkflowEngine.resume(...)` gain an optional `gate_script` parameter that flows into the StepContext. - `GateStep.execute` consults the script at the start: it increments the firing counter for the current gate's base id, looks up the scripted verdict, and if found returns it directly (honouring `on_reject` for scripted rejects). If no entry matches, behaviour is byte-identical to before this change. - `specify workflow run` gains a `--gate-script PATH` option that loads + validates the script before passing it to the engine. Loader errors are fatal — the operator asked for non-interactive run, so silently falling back to prompts on a broken script would be worse than failing fast. ### Default behaviour preserved Workflows that don't pass `--gate-script` see no difference. The only new output field is `output.scripted: False` for unscripted gates — additive, no existing field removed or renamed. Existing gate tests assert specific fields rather than full output shape so nothing breaks. ### Tests `TestGateScriptLoader` (13 unit tests): schema validation, missing fields, type checks (including the `bool`-is-a-subclass-of-`int` trap), file loading, base-gate-id extraction (with and without loop namespacing). `TestGateScriptIntegration` (5 end-to-end tests): | Test | What it locks | |---|---| | `test_gate_uses_scripted_verdict_instead_of_prompting` | Core contract — scripted verdict short-circuits the prompt, `output.scripted = True`. | | `test_improve_then_approve_cycle_via_switch_unroll` | The canonical CI scenario from the issue — two gates driven entirely by script. Decoupled from the still-open loop-namespacing bug (github#2592) by using a switch-unrolled cycle. | | `test_iteration_counter_increments_on_repeated_gate_id` | Per-gate counter is independent across multiple gates. | | `test_default_behaviour_preserved_without_script` | Locks the byte-equivalent default — workflows without `--gate-script` PAUSE on gates in non-TTY exactly as before. | | `test_non_matching_script_entry_falls_back_to_prompt` | Partial-script contract — gates without an entry fall back normally. | | `test_scripted_reject_with_abort_halts_run` | Scripted `reject` + `on_reject: abort` → `ABORTED`, same as operator-driven. | ### Docs `workflows/README.md` gains a "Non-Interactive Gate Testing" section documenting the CLI flag, the schema, the matching rules, and the `output.scripted` distinguisher. ### Not in scope here - **Shape B** (documented prompt-format regex for pexpect drivers) from the issue. Shape A is more CI-friendly; B can be added later as a separate PR if there's demand. - **Persisting the firing counter across resume.** The counter lives on `StepContext` only, not `RunState`. For scripted CI runs this is fine (the run completes in one shot). Adding it to `RunState` is a follow-on if scripted gates need to survive pause-resume cycles.
a980576 to
5172166
Compare
3 tasks
Author
|
Closing per @mnriem's feedback on the parent issue (#2594): the existing Branch retained on the fork for reference; no further work planned here. Thanks for the consideration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #2594.
Adds a documented contract for scripting gate verdicts during
specify workflow run, so CI and test runners can drive workflowsthrough gated paths without operator interaction. Implements
shape A from the issue (
--gate-scriptflag with a typedspeckit.gate-script/v1schema).Why
Spec Kit's
gatestep prompts interactively for a verdict.Without a documented way to script those verdicts:
changes to gate prompt format, output channel, or switch
routing on verdict can regress without coverage.
cannot be CI-tested without hand-rolling a
pexpectdriveragainst an undocumented prompt format.
Canonical usage
Behaviour
gate_idmatches the workflow YAML stepid(theauthor-visible base id — engine-internal loop namespacing like
parent:child:Nis unwrapped automatically viaextract_base_gate_id).iterationselects which firing the verdict applies to(0-indexed, counting from the first time the gate runs within
the current run; per-gate counter held on
StepContext).verdictis the option string the gate would otherwiseproduce — typically
approve/reject/edit/ a customroute name.
behaviour (interactive prompt on TTY,
PAUSEDotherwise) —workflows can partially-script only the gates they care about.
output.scriptedrecordsTruefor scripted verdicts andFalsefor interactive ones, so downstream steps candistinguish them.
rejectverdicts honour the gate'son_rejectsetting identically to operator-driven rejects (
abort→ABORTED,skip→COMPLETED,retry→PAUSED).Implementation
src/specify_cli/workflows/gate_script.py— new module withload_gate_script(path),parse_gate_script(data),lookup_scripted_verdict(script, base_id, iteration), andextract_base_gate_id(step_id)helpers. The schema andvalidator are in one place so the CLI loader, the engine
consultation path, and tests share one contract.
StepContextgains two fields:gate_script(parsed list) andgate_firing_counts(per-base-gate-id counter). Both defaultto empty so existing constructions are unaffected.
WorkflowEngine.execute(...)andWorkflowEngine.resume(...)gain an optional
gate_scriptparameter.GateStep.executeconsults the script at the start: itincrements the firing counter for the current gate's base id,
looks up the scripted verdict, and if found returns it directly
(honouring
on_rejectfor scripted rejects). When no entrymatches, behaviour is byte-identical to before this change.
specify workflow rungains a--gate-script PATHoption thatloads + validates the script before passing it to the engine.
Loader errors are fatal — the operator asked for non-interactive
run, so silently falling back to prompts on a broken script
would be worse than failing fast.
Default behaviour preserved
Workflows that don't pass
--gate-scriptsee no difference. Theonly new output field is
output.scripted: Falsefor unscriptedgates — additive, no existing field removed or renamed. Existing
gate tests use field-specific assertions rather than full output
shape, so nothing breaks.
Not in scope here
from the issue. Shape A is more CI-friendly; B can be added
later as a separate PR if there's demand.
lives on
StepContextonly, notRunState. For scripted CIruns this is fine (the run completes in one shot). Adding it to
RunStateis a follow-on if scripted gates need to survivepause-resume cycles.
Testing
uv run specify --help— the new--gate-scriptflag appears underworkflow run.uv sync && uv run pytest→ 2980 passed, 35 skipped (was 2960 before; +20 new
tests added in this PR).
verdicts.yamldriving both verdicts toapprove. Workflowcompleted end-to-end with zero operator input;
state.step_resultsshowed
output.scripted = Trueon both gates. Re-ran thesame workflow without
--gate-script— first gatePAUSEDas expected (non-TTY).
New test coverage
TestGateScriptLoader(13 unit tests):test_parse_valid_script_returns_verdictstest_parse_rejects_wrong_schematest_parse_rejects_missing_verdictstest_parse_rejects_non_mapping_verdicttest_parse_rejects_missing_required_fieldstest_parse_rejects_non_int_iterationtest_parse_rejects_bool_iterationbool-is-a-subclass-of-inttrap.test_lookup_finds_matching_entrytest_lookup_returns_none_for_no_matchtest_lookup_empty_script_returns_nonetest_extract_base_gate_id_unwrappedtest_extract_base_gate_id_unwraps_loop_namespacingparent:child:Nnamespacing.test_load_from_file/test_load_missing_file_raisesTestGateScriptIntegration(5 end-to-end tests):test_gate_uses_scripted_verdict_instead_of_promptingoutput.scripted = True.test_improve_then_approve_cycle_via_switch_unrolldo-while.test_iteration_counter_increments_on_repeated_gate_idtest_default_behaviour_preserved_without_scripttest_non_matching_script_entry_falls_back_to_prompttest_scripted_reject_with_abort_halts_runreject+on_reject: abort→ABORTED, same as operator-driven.AI Disclosure
Used Claude Opus to draft the loader module, the engine wiring,
the test suite, the docs section, and this PR body. The proposed
shape (
speckit.gate-script/v1+--gate-scriptflag) wasspecified in issue #2594; this PR implements that proposal. Code,
tests, and design decisions were human-reviewed before
submission.
Update (post-open audit)
Two improvements after a self-review pass:
--gate-scriptnow also accepts onworkflow resumeThe engine API already supported it
(
WorkflowEngine.resume(gate_script=...)) but the CLI didn'texpose the flag — a real consistency gap.
specify workflow resume <run_id> --gate-script verdicts.yamlnow works the same way asworkflow run --gate-script, so a scripted CI run can also drivegates that fire only after a prior pause.
Validator rejects duplicate
(gate_id, iteration)pairslookup_scripted_verdictreturns the first match, so two entriessharing the same
(gate_id, iteration)would silently shadow eachother — almost always a copy-paste authoring mistake. The parser
now fails fast at load time. Two new tests pin this:
test_parse_rejects_duplicate_gate_id_iterationtest_parse_allows_same_gate_id_different_iterations(positivecase — distinct iterations remain valid, which is how the
improve→approve pattern works)
Suite is now 2982 passed (was 2960 baseline; +22 new tests
covering this PR).