Skip to content

feat(workflows): add --gate-script for non-interactive gate testing#2667

Closed
doquanghuy wants to merge 1 commit into
github:mainfrom
doquanghuy:feat/gate-script
Closed

feat(workflows): add --gate-script for non-interactive gate testing#2667
doquanghuy wants to merge 1 commit into
github:mainfrom
doquanghuy:feat/gate-script

Conversation

@doquanghuy
Copy link
Copy Markdown

@doquanghuy doquanghuy commented May 21, 2026

Description

Closes #2594.

Adds a documented contract for scripting gate verdicts during
specify workflow run, so CI and test runners can drive workflows
through gated paths without operator interaction. Implements
shape A from the issue (--gate-script flag with a typed
speckit.gate-script/v1 schema).

Note for maintainers: the issue invited a polite close if
gate behaviour is intentionally manual-only. Submitting the PR
so reviewers can see the concrete shape — happy to close
cleanly if that's the call.

Why

Spec Kit's gate step prompts interactively for a verdict.
Without a documented way to script those verdicts:

  • Spec Kit's own CI cannot exercise gate behaviour end-to-end —
    changes to gate prompt format, output channel, or switch
    routing on verdict can regress without coverage.
  • Workflows authored on Spec Kit that depend on gate semantics
    cannot be CI-tested without hand-rolling a pexpect driver
    against an undocumented prompt format.

Canonical usage

specify workflow run my-pipeline --gate-script verdicts.yaml
# verdicts.yaml
schema: speckit.gate-script/v1
verdicts:
  - gate_id: review-overview
    iteration: 0
    verdict: improve
  - gate_id: review-overview
    iteration: 1
    verdict: approve
  - gate_id: review-final
    iteration: 0
    verdict: approve

Behaviour

  • gate_id matches the workflow YAML step id (the
    author-visible base id — engine-internal loop namespacing like
    parent:child:N is unwrapped automatically via
    extract_base_gate_id).
  • iteration selects which firing the verdict applies to
    (0-indexed, counting from the first time the gate runs within
    the current run; per-gate counter held on StepContext).
  • verdict is the option string the gate would otherwise
    produce — typically approve / reject / edit / a custom
    route name.
  • When no entry matches, the gate falls back to its normal
    behaviour (interactive prompt on TTY, PAUSED otherwise) —
    workflows can partially-script only the gates they care about.
  • output.scripted records True for scripted verdicts and
    False for interactive ones, so downstream steps can
    distinguish them.
  • Scripted reject verdicts honour the gate's on_reject
    setting identically to operator-driven rejects (abort
    ABORTED, skipCOMPLETED, retryPAUSED).

Implementation

  • src/specify_cli/workflows/gate_script.py — new module with
    load_gate_script(path), parse_gate_script(data),
    lookup_scripted_verdict(script, base_id, iteration), and
    extract_base_gate_id(step_id) helpers. The schema and
    validator are in one place so the CLI loader, the engine
    consultation path, and tests share one contract.
  • StepContext gains two fields: gate_script (parsed list) and
    gate_firing_counts (per-base-gate-id counter). Both default
    to empty so existing constructions are unaffected.
  • WorkflowEngine.execute(...) and WorkflowEngine.resume(...)
    gain an optional gate_script parameter.
  • GateStep.execute consults the script at the start: it
    increments the firing counter for the current gate's base id,
    looks up the scripted verdict, and if found returns it directly
    (honouring on_reject for scripted rejects). When no entry
    matches, behaviour is byte-identical to before this change.
  • specify workflow run gains a --gate-script PATH option that
    loads + validates the script before passing it to the engine.
    Loader errors are fatal — the operator asked for non-interactive
    run, so silently falling back to prompts on a broken script
    would be worse than failing fast.

Default behaviour preserved

Workflows that don't pass --gate-script see no difference. The
only new output field is output.scripted: False for unscripted
gates — additive, no existing field removed or renamed. Existing
gate tests use field-specific assertions rather than full output
shape, so nothing breaks.

Not in scope here

  • Shape B (documented prompt-format regex for pexpect drivers)
    from the issue. Shape A is more CI-friendly; B can be added
    later as a separate PR if there's demand.
  • Persisting the firing counter across resume. The counter
    lives on StepContext only, not RunState. For scripted CI
    runs this is fine (the run completes in one shot). Adding it to
    RunState is a follow-on if scripted gates need to survive
    pause-resume cycles.

Testing

  • Tested locally with uv run specify --help — the new
    --gate-script flag appears under workflow run.
  • Ran existing tests with uv sync && uv run pytest
    2980 passed, 35 skipped (was 2960 before; +20 new
    tests added in this PR).
  • Tested with a sample workflow: ran a 2-gate workflow with a
    verdicts.yaml driving both verdicts to approve. Workflow
    completed end-to-end with zero operator input; state.step_results
    showed output.scripted = True on both gates. Re-ran the
    same workflow without --gate-script — first gate PAUSED
    as expected (non-TTY).

New test coverage

TestGateScriptLoader (13 unit tests):

Test What it locks
test_parse_valid_script_returns_verdicts Happy path.
test_parse_rejects_wrong_schema Schema gating.
test_parse_rejects_missing_verdicts Required top-level field.
test_parse_rejects_non_mapping_verdict Per-entry shape.
test_parse_rejects_missing_required_fields Field completeness.
test_parse_rejects_non_int_iteration Type strictness.
test_parse_rejects_bool_iteration bool-is-a-subclass-of-int trap.
test_lookup_finds_matching_entry Lookup semantics.
test_lookup_returns_none_for_no_match Missing-entry contract.
test_lookup_empty_script_returns_none Empty-script safety.
test_extract_base_gate_id_unwrapped Identity when not namespaced.
test_extract_base_gate_id_unwraps_loop_namespacing Inverts engine's parent:child:N namespacing.
test_load_from_file / test_load_missing_file_raises File-loader path.

TestGateScriptIntegration (5 end-to-end tests):

Test What it locks
test_gate_uses_scripted_verdict_instead_of_prompting Core contract — script short-circuits prompt, output.scripted = True.
test_improve_then_approve_cycle_via_switch_unroll Canonical CI scenario — two gates driven entirely by script. Decoupled from the still-open loop-namespacing bug (#2592) by using a switch-unrolled cycle rather than a do-while.
test_iteration_counter_increments_on_repeated_gate_id Per-gate counter is independent across multiple gates.
test_default_behaviour_preserved_without_script Byte-equivalent default — workflows without the flag PAUSE on gates in non-TTY exactly as before.
test_non_matching_script_entry_falls_back_to_prompt Partial-script contract — gates without an entry fall back normally.
test_scripted_reject_with_abort_halts_run Scripted reject + on_reject: abortABORTED, same as operator-driven.

AI Disclosure

  • I did not use AI assistance for this contribution
  • I did use AI assistance (described below)

Used Claude Opus to draft the loader module, the engine wiring,
the test suite, the docs section, and this PR body. The proposed
shape (speckit.gate-script/v1 + --gate-script flag) was
specified in issue #2594; this PR implements that proposal. Code,
tests, and design decisions were human-reviewed before
submission.


Update (post-open audit)

Two improvements after a self-review pass:

--gate-script now also accepts on workflow resume

The engine API already supported it
(WorkflowEngine.resume(gate_script=...)) but the CLI didn't
expose the flag — a real consistency gap. specify workflow resume <run_id> --gate-script verdicts.yaml now works the same way as
workflow run --gate-script, so a scripted CI run can also drive
gates that fire only after a prior pause.

specify workflow run my-pipeline --gate-script verdicts.yaml
specify workflow resume <run_id> --gate-script verdicts.yaml

Validator rejects duplicate (gate_id, iteration) pairs

lookup_scripted_verdict returns the first match, so two entries
sharing the same (gate_id, iteration) would silently shadow each
other — almost always a copy-paste authoring mistake. The parser
now fails fast at load time. Two new tests pin this:

  • test_parse_rejects_duplicate_gate_id_iteration
  • test_parse_allows_same_gate_id_different_iterations (positive
    case — distinct iterations remain valid, which is how the
    improve→approve pattern works)

Suite is now 2982 passed (was 2960 baseline; +22 new tests
covering this PR).

@doquanghuy doquanghuy requested a review from mnriem as a code owner May 21, 2026 16:33
Closes github#2594.

Adds a documented contract for scripting gate verdicts during
`specify workflow run`, so CI and test runners can drive workflows
through gated paths without operator interaction.

### Why

Spec Kit's `gate` step prompts interactively for a verdict. Without
a documented way to script those verdicts:

- Spec Kit's own CI cannot exercise gate behaviour end-to-end —
  changes to gate prompt format, output channel, or switch routing
  on verdict can regress without coverage.
- Workflows authored on Spec Kit that depend on gate semantics
  cannot be CI-tested without hand-rolling a `pexpect` driver
  against an undocumented prompt format.

### How it works

This PR implements shape A from issue github#2594 — `--gate-script` flag.

```bash
specify workflow run my-pipeline --gate-script verdicts.yaml
```

```yaml
# verdicts.yaml
schema: speckit.gate-script/v1
verdicts:
  - gate_id: review-overview
    iteration: 0
    verdict: improve
  - gate_id: review-overview
    iteration: 1
    verdict: approve
```

Behaviour:

- `gate_id` matches the workflow YAML step `id` (engine-internal
  loop namespacing like `parent:child:N` is unwrapped automatically
  via `extract_base_gate_id`).
- `iteration` selects which firing the verdict applies to
  (0-indexed, counting from the first time the gate runs within
  the current run; per-gate counter held on `StepContext`).
- `verdict` is the option string the gate would otherwise produce.
- When the engine finds no matching entry, the gate falls back to
  its normal behaviour (interactive prompt on TTY, `PAUSED`
  otherwise) — workflows can partially-script only the gates they
  care about.
- `output.scripted` records `True` for scripted verdicts so
  workflows can distinguish them downstream.
- Scripted `reject` verdicts honour the gate's `on_reject`
  setting identically to operator-driven rejects.

### Implementation

- `src/specify_cli/workflows/gate_script.py` — new module with
  `load_gate_script(path)`, `parse_gate_script(data)`,
  `lookup_scripted_verdict(script, base_id, iteration)`, and
  `extract_base_gate_id(step_id)` helpers. The schema
  (`speckit.gate-script/v1`) and validator are in one place so the
  CLI loader, the engine consultation path, and tests share one
  contract.
- `StepContext` gains two fields: `gate_script` (the parsed list)
  and `gate_firing_counts` (per-base-gate-id counter). Both
  default to empty so existing constructions are unaffected.
- `WorkflowEngine.execute(...)` and `WorkflowEngine.resume(...)`
  gain an optional `gate_script` parameter that flows into the
  StepContext.
- `GateStep.execute` consults the script at the start: it
  increments the firing counter for the current gate's base id,
  looks up the scripted verdict, and if found returns it directly
  (honouring `on_reject` for scripted rejects). If no entry
  matches, behaviour is byte-identical to before this change.
- `specify workflow run` gains a `--gate-script PATH` option that
  loads + validates the script before passing it to the engine.
  Loader errors are fatal — the operator asked for non-interactive
  run, so silently falling back to prompts on a broken script
  would be worse than failing fast.

### Default behaviour preserved

Workflows that don't pass `--gate-script` see no difference. The
only new output field is `output.scripted: False` for unscripted
gates — additive, no existing field removed or renamed. Existing
gate tests assert specific fields rather than full output shape so
nothing breaks.

### Tests

`TestGateScriptLoader` (13 unit tests): schema validation, missing
fields, type checks (including the `bool`-is-a-subclass-of-`int`
trap), file loading, base-gate-id extraction (with and without
loop namespacing).

`TestGateScriptIntegration` (5 end-to-end tests):

| Test | What it locks |
|---|---|
| `test_gate_uses_scripted_verdict_instead_of_prompting` | Core contract — scripted verdict short-circuits the prompt, `output.scripted = True`. |
| `test_improve_then_approve_cycle_via_switch_unroll` | The canonical CI scenario from the issue — two gates driven entirely by script. Decoupled from the still-open loop-namespacing bug (github#2592) by using a switch-unrolled cycle. |
| `test_iteration_counter_increments_on_repeated_gate_id` | Per-gate counter is independent across multiple gates. |
| `test_default_behaviour_preserved_without_script` | Locks the byte-equivalent default — workflows without `--gate-script` PAUSE on gates in non-TTY exactly as before. |
| `test_non_matching_script_entry_falls_back_to_prompt` | Partial-script contract — gates without an entry fall back normally. |
| `test_scripted_reject_with_abort_halts_run` | Scripted `reject` + `on_reject: abort` → `ABORTED`, same as operator-driven. |

### Docs

`workflows/README.md` gains a "Non-Interactive Gate Testing"
section documenting the CLI flag, the schema, the matching rules,
and the `output.scripted` distinguisher.

### Not in scope here

- **Shape B** (documented prompt-format regex for pexpect drivers)
  from the issue. Shape A is more CI-friendly; B can be added
  later as a separate PR if there's demand.
- **Persisting the firing counter across resume.** The counter
  lives on `StepContext` only, not `RunState`. For scripted CI
  runs this is fine (the run completes in one shot). Adding it to
  `RunState` is a follow-on if scripted gates need to survive
  pause-resume cycles.
@doquanghuy
Copy link
Copy Markdown
Author

Closing per @mnriem's feedback on the parent issue (#2594): the existing type: if step + a ci_mode boolean input cleanly covers the CI / non-interactive use case without a new CLI flag or schema. The new surface this PR introduces (--gate-script, speckit.gate-script/v1, StepContext.gate_script, output.scripted) isn't justified once the existing-primitives pattern is in view.

Branch retained on the fork for reference; no further work planned here.

Thanks for the consideration.

@doquanghuy doquanghuy closed this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Document a test harness for scripting gate verdicts non-interactively

1 participant