feat(validator): add SOFT eval-coverage check (check #8)#481
Merged
Conversation
Every skill under skills/ must ship a matching behavioural eval suite under tools/skill-evals/evals/<slug>/. The new validate_eval_coverage function surfaces missing suites as SOFT advisory violations so that in-flight eval PRs do not fail the gate while their branches are pending review. Against the live repo the check correctly flags the two skills that currently have in-flight eval branches (pr-management-quick-merge and setup-status) and is silent on all others. 8 new test cases cover the happy path, the missing-eval path, missing-both-dirs paths, the soft-category membership, and the non-directory skip. Addresses the Known Gap in specs/meta-and-quality-tooling.md: "Eval coverage is incomplete — skills added before the per-skill-eval convention have no suite." The check prevents future regressions. Generated-by: Claude (Opus 4.7)
Member
Author
Correctness[advisory] Check-number inconsistency: docstring:47 + test say "#8"; section comment:1660 says "check #9". Reconcile (see cross-PR note). Security / ConventionsNo findings |
15 tasks
Member
|
Rebasing it :) |
Member
|
The cool thing with agent is that they even replace "eight" with "nine" while resolving this conflict :) |
3af6556 to
b571a3a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every skill under skills/ must ship a matching behavioural eval suite under tools/skill-evals/evals//. The new validate_eval_coverage function flags missing suites as SOFT advisory violations so that in-flight eval PRs do not fail the gate while their branches are pending review.
Against the live repo the check correctly flags the two skills that currently have in-flight eval branches (pr-management-quick-merge and setup-status) and is silent on all others. 8 new test cases cover the happy path, the missing-eval path, missing-both-dirs paths, the soft-category membership, and the non-directory skip.
Addresses the Known Gap in specs/meta-and-quality-tooling.md: "Eval coverage is incomplete — skills added before the per-skill-eval convention have no suite." The check prevents future regressions.
Type of change
.claude/skills/<name>/) — eval fixtures updated belowtools/<system>/*.md)tools/*/withpyproject.toml)docs/,README.md,CONTRIBUTING.md)projects/_template/)prek, workflows, validators)Test plan
prek run --all-filespassesuv run pytest/ruff check/mypypasses(
PYTHONPATH=tools/skill-evals/src python3 -m skill_evals.runner tools/skill-evals/evals/<skill>/)(a regression test for the bug fixed / the behaviour added — see CONTRIBUTING.md)
RFC-AI-0004 compliance
<PROJECT>,<tracker>,<upstream>,<security-list>) used in all skill / tool prose (thecheck-placeholdersprek hook is the mechanical gate)Linked issues
Notes for reviewers (optional)