Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .agents/skills/systematic-qa/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
name: systematic-qa
description: Executes full-project QA like a real user by discovering the repository verification contract, running build, lint, test, and startup commands, exercising core workflows end-to-end, creating realistic fixtures when needed, fixing root-cause regressions, and rerunning the full gate. Use when validating a branch, release candidate, migration, refactor, or risky commit. Do not use for static code review only, one-off unit test edits, or architecture brainstorming without execution.
---

# Systematic Project QA

## Procedures

**Step 1: Discover the Repository QA Contract**

1. Read root instructions, repository docs, and CI/build files before running commands.
2. Execute `python3 scripts/discover-project-contract.py --root .` to surface candidate install, verify, build, test, lint, and start commands.
3. Prefer repository-defined umbrella commands such as `make verify`, `just verify`, or CI entrypoints over language-default commands.
4. Read `references/project-signals.md` when command ownership is ambiguous or when multiple ecosystems are present.
5. Identify the changed surface and the regression-critical surface before choosing scenarios.
6. Choose a QA artifact location using repository conventions. If the repository has no QA artifact convention, store scratch artifacts under `/tmp/codex-qa-<slug>`.

**Step 2: Define the QA Scope**

1. Build a short execution matrix covering baseline verification, changed workflows, and unchanged business-critical workflows.
2. Read `references/checklist.md` and ensure every required category has a planned validation.
3. Prefer public entry points such as CLI commands, HTTP endpoints, browser flows, worker jobs, and documented setup commands over internal test helpers.
4. Create the smallest realistic fixture or fake project needed to exercise the workflow when the repository does not already include one.
5. Treat mocks as a local unit-test boundary only. Do not use mocks or stubs as final proof that a user flow works.

**Step 3: Establish the Baseline**

1. Install dependencies with the repository-preferred command before testing runtime flows.
2. Run the canonical verification gate once before scenario testing to establish baseline health.
3. If the baseline fails, read the first failing output carefully and determine whether it is pre-existing or introduced by current work before moving on.
4. Start services in the closest supported production-like mode and confirm readiness through observable signals such as health checks, startup logs, or successful handshakes.

**Step 4: Execute User-Like Flows**

1. Drive workflows through the same interfaces a real operator or user would use.
2. Capture the exact command, input, and observable result for each scenario.
3. Validate changed features first, then validate at least one regression-critical flow outside the changed surface.
4. Exercise live integrations when credentials and local prerequisites exist. When they do not, validate every reachable local boundary and record the blocked live step explicitly.
5. Re-run the scenario from a clean state when the first attempt leaves the environment ambiguous.

**Step 5: Diagnose and Fix Regressions**

1. Reproduce each failure consistently before proposing a fix.
2. Activate companion debugging and test-hygiene skills when available, especially root-cause debugging and anti-workaround guidance.
3. Add or update the narrowest regression test that proves the bug when the repository supports automated coverage for that surface.
4. Fix production code or real configuration at the source of the failure. Do not weaken tests to match broken behavior.
5. Re-run the narrow reproduction, the impacted scenario, and the baseline gate after each fix.
6. Use `assets/issue-template.md` when the user wants persisted issue files or when the repository already has a QA issue convention.

**Step 6: Verify the Final State**

1. Re-run the full repository verification gate from scratch after the last code change.
2. Re-run the most important user-like scenarios after the full gate passes.
3. Summarize the evidence using `assets/verification-report-template.md`.
4. Report blocked scenarios, missing credentials, or environment gaps with the exact command or prerequisite that stopped execution.
5. Do not claim completion without fresh verification evidence from the current state of the repository.

## Error Handling

- If command discovery returns multiple plausible gates, prefer the broadest repository-defined command and explain the tie-breaker.
- If no canonical verify command exists, read `references/project-signals.md`, choose the broadest safe install, lint, test, and build commands for the detected ecosystem, and state that assumption explicitly.
- If a required live dependency is unavailable, validate every local boundary that does not require the missing dependency and report the blocked live validation separately.
- If a workflow requires data or services absent from the repository, create the smallest realistic fixture outside the main source tree unless the repository has its own fixture convention.
- If a failure appears unrelated to the requested change, prove that with a clean reproduction before excluding it from the QA scope.
32 changes: 32 additions & 0 deletions .agents/skills/systematic-qa/assets/issue-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Issue <num>: <short-title>

## Summary

<Describe the observable failure in one short paragraph.>

## Reproduction

```bash
<exact command or sequence>
```

Observed before the fix:

- <observable result>

## Expected

<Describe the correct behavior.>

## Root cause

<Describe the actual source of the failure, not the symptom.>

## Fix

<Describe the production change that fixed the root cause.>

## Verification

- <narrow reproduction rerun>
- <broader regression or full gate rerun>
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
VERIFICATION REPORT
-------------------
Claim: <what is being claimed>
Command: `<full verification command>`
Executed: <timestamp or relative time>
Exit code: <0 or non-zero>
Output summary: <key pass/fail lines, counts, build result>
Warnings: <none or list>
Errors: <none or list>
Verdict: PASS or FAIL
36 changes: 36 additions & 0 deletions .agents/skills/systematic-qa/references/checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Systematic Project QA Checklist

Mark every item as complete before claiming the QA pass is done.

## Contract Discovery

- [ ] Root instructions and repository docs were read
- [ ] The canonical verify gate was identified or an explicit fallback was chosen
- [ ] The changed surface and regression-critical surface were identified

## Baseline

- [ ] Dependencies were installed with the repository-preferred command
- [ ] The baseline verification gate was run before scenario testing
- [ ] Any pre-existing failures were isolated with evidence

## User-Like Validation

- [ ] Changed workflows were exercised through public interfaces
- [ ] At least one unchanged regression-critical workflow was exercised
- [ ] Runtime readiness was confirmed with observable signals
- [ ] Fixtures or fake projects were realistic and minimal

## Regression Handling

- [ ] Every failure was reproduced before fixing
- [ ] Root cause was identified before implementation
- [ ] Regression coverage was added or updated when the repository supported it
- [ ] The narrow repro and impacted flows were rerun after each fix

## Final Verification

- [ ] The full verification gate was rerun after the last code change
- [ ] The most important user-like flows were rerun after the final gate
- [ ] A verification report was produced from fresh evidence
- [ ] Blocked scenarios or missing prerequisites were disclosed explicitly
57 changes: 57 additions & 0 deletions .agents/skills/systematic-qa/references/project-signals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Project Signal Guide

Use this guide when repository instructions do not already define the canonical QA contract.

## Priority Order

1. Root instructions such as `AGENTS.md`, `CLAUDE.md`, or repository-specific agent docs
2. Dedicated umbrella commands in `Makefile`, `Justfile`, task runners, or CI wrapper scripts
3. CI workflows under `.github/workflows/`
4. Ecosystem-native manifests such as `package.json`, `go.mod`, `pyproject.toml`, or `Cargo.toml`
5. Language-default commands as a last resort

## Common Signals

### Makefile or Justfile

Treat `verify`, `check`, `ci`, `test`, `lint`, `build`, `start`, `run`, and `dev` as high-confidence targets.

### package.json

Prefer explicit scripts in this order:

1. `verify`, `check`, `ci`
2. `test`, `test:ci`, `test:e2e`, `test:integration`
3. `lint`, `typecheck`
4. `build`
5. `start`, `dev`, `serve`, `preview`

### Go modules

If no umbrella command exists, treat `go test ./...`, `go build ./...`, and repository formatting/lint commands as the minimum baseline. Prefer repository wrappers over direct Go commands when both exist.

### Python projects

Look for `pytest`, `tox`, `nox`, `ruff`, `mypy`, `python -m build`, and any scripts declared in `pyproject.toml`.

### Rust projects

Treat `cargo test`, `cargo build`, `cargo fmt --check`, and `cargo clippy --all-targets --all-features -- -D warnings` as strong defaults when the repository does not define wrappers.

### Mixed Repositories

When multiple ecosystems exist, identify the product entrypoint first. Do not assume every manifest is part of the same runtime surface.

## Scenario Selection Rules

Always cover:

1. A baseline verification gate
2. The workflows directly touched by the change
3. At least one adjacent regression-critical workflow
4. Startup or readiness if the change can affect bootstrapping
5. A realistic fixture path if the feature consumes external projects, repos, files, or APIs

## Evidence Rules

Capture exact commands, inputs, outputs, and artifact paths. Prefer observable outcomes over interpretation.
Loading