diff --git a/docs/labels-and-capabilities.md b/docs/labels-and-capabilities.md index 448651c6..8d1e7e44 100644 --- a/docs/labels-and-capabilities.md +++ b/docs/labels-and-capabilities.md @@ -274,6 +274,7 @@ it implements multiple contracts (e.g. `tools/gmail` provides both | [`tools/spec-loop`](../tools/spec-loop/) | `substrate:framework-dev` | Spec-driven build loop runner (Ralph-style) for framework development | | [`tools/skill-evals`](../tools/skill-evals/) | `substrate:framework-dev` | Eval harness for skills; framework-dev infrastructure whose run output is governance evidence | | [`tools/skill-and-tool-validator`](../tools/skill-and-tool-validator/) | `substrate:framework-dev` | Skill-frontmatter and convention validator | +| [`tools/spec-inventory`](../tools/spec-inventory/) | `substrate:framework-dev` + `substrate:analytics` | Compact routing inventory for spec-loop prompts — summarizes specs, skills, and tool metadata so agents can choose relevant files before direct verification | | [`tools/spec-status-index`](../tools/spec-status-index/) | `substrate:framework-dev` + `substrate:analytics` | Index of spec / RFC implementation status — framework-dev substrate that also doubles as a governance/stats view (`analytics`) | | [`tools/vendor-neutrality-score`](../tools/vendor-neutrality-score/) | `substrate:framework-dev` + `substrate:analytics` | Deterministic vendor-neutrality score — reads each contract tool's `**Kind:**` / `**Vendor:**` metadata and scores per-contract + per-skill neutrality (`analytics`); backs the score block in [`docs/vendor-neutrality.md`](vendor-neutrality.md) | | [`tools/spec-validator`](../tools/spec-validator/) | `substrate:framework-dev` | Spec-frontmatter and body-section validator — counterpart to `skill-and-tool-validator` for `tools/spec-loop/specs/` | diff --git a/docs/vendor-neutrality.md b/docs/vendor-neutrality.md index 25f0f037..5950be5d 100644 --- a/docs/vendor-neutrality.md +++ b/docs/vendor-neutrality.md @@ -544,7 +544,6 @@ approval on endpoint identity rather than vendor. Both appear in the generated block below. - **Overall vendor-neutrality score: 10/10 capability contracts (100%).** Generated by [`tools/vendor-neutrality-score`](../tools/vendor-neutrality-score/); re-run it to refresh this section. | Capability contract | Neutral? | Class | Backends today | Basis | @@ -572,7 +571,7 @@ Organization scope (declared, orthogonal to vendor): ASF = 14, agnostic = 49. **LLM / agent-integration neutrality** -**Agent harness: 20/20 substrate tools run under any harness unchanged (100%).** Substrate tools are Magpie's own machinery; each declares the agent harness it integrates with (`**Harness:**`), or `agnostic`. A tool is neutral when it is harness-agnostic or supports two or more harnesses; *coupled* when it targets a single harness. +**Agent harness: 21/21 substrate tools run under any harness unchanged (100%).** Substrate tools are Magpie's own machinery; each declares the agent harness it integrates with (`**Harness:**`), or `agnostic`. A tool is neutral when it is harness-agnostic or supports two or more harnesses; *coupled* when it targets a single harness. | Substrate tool | Substrate | Harness support | Verdict | |---|---|---|---| @@ -591,6 +590,7 @@ Organization scope (declared, orthogonal to vendor): ASF = 14, agnostic = 49. | `security-tracker-stats-dashboard` | analytics | any | ✅ agnostic | | `skill-and-tool-validator` | framework-dev | any | ✅ agnostic | | `skill-evals` | framework-dev | any | ✅ agnostic | +| `spec-inventory` | framework-dev, analytics | any | ✅ agnostic | | `spec-loop` | framework-dev | Claude Code, Codex, Cursor, Gemini CLI, OpenCode | ✅ portable | | `spec-status-index` | framework-dev, analytics | any | ✅ agnostic | | `spec-validator` | framework-dev | any | ✅ agnostic | @@ -604,7 +604,7 @@ Harness → substrate tools it supports: - **Cursor** (1): `spec-loop` - **Gemini CLI** (1): `spec-loop` - **OpenCode** (5): `agent-guard`, `agent-isolation`, `permission-audit`, `sandbox-lint`, `spec-loop` -- **any harness** (15): `dashboard-generator`, `dev`, `egress-gateway`, `pilot-report-validator`, `pr-management-stats`, `preflight-audit`, `privacy-llm`, `probe-templates`, `security-tracker-stats-dashboard`, `skill-and-tool-validator`, `skill-evals`, `spec-status-index`, `spec-validator`, `symlink-lint`, `vendor-neutrality-score` +- **any harness** (16): `dashboard-generator`, `dev`, `egress-gateway`, `pilot-report-validator`, `pr-management-stats`, `preflight-audit`, `privacy-llm`, `probe-templates`, `security-tracker-stats-dashboard`, `skill-and-tool-validator`, `skill-evals`, `spec-inventory`, `spec-status-index`, `spec-validator`, `symlink-lint`, `vendor-neutrality-score` **Model endpoint: neutral by construction — 4 default-approved endpoint classes across independent trust domains, plus adopter opt-in.** From the [`privacy-llm` registry](../tools/privacy-llm/models.md): the framework keys approval on *endpoint identity*, not on who hosts the model, so no single LLM vendor is privileged. @@ -616,7 +616,6 @@ Harness → substrate tools it supports: | Air-gapped on-prem | A PMC-hosted inference appliance on a private VLAN | Every other endpoint is **opt-in** — the adopting project's security team declares it in `/privacy-llm.md` (endpoint URL, data-residency contract, approver), so the choice is local and audited. - ### What the number means diff --git a/pyproject.toml b/pyproject.toml index e64ea59f..6687cda3 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -114,6 +114,7 @@ members = [ "tools/skill-and-tool-validator", "tools/skill-evals", "tools/pilot-report-validator", + "tools/spec-inventory", "tools/spec-status-index", "tools/spec-validator", "tools/symlink-lint", diff --git a/tools/spec-inventory/README.md b/tools/spec-inventory/README.md new file mode 100644 index 00000000..33ebc587 --- /dev/null +++ b/tools/spec-inventory/README.md @@ -0,0 +1,51 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [spec-inventory](#spec-inventory) + - [Prerequisites](#prerequisites) + - [Usage](#usage) + - [Run tests](#run-tests) + + + + + +# spec-inventory + +**Capability:** substrate:framework-dev + substrate:analytics + +**Harness:** agnostic + +A deterministic `uv` tool that emits a compact routing inventory for the +spec-loop prompts. It summarizes spec frontmatter, where-it-lives hints, +validation commands, known gaps, skill frontmatter, and tool/test +presence so agents can choose what to read next without first scanning +the whole repository. + +The output is a routing aid, not proof. Prompts still require direct file +reads or code search before declaring behaviour present or absent. + +## Prerequisites + +- **Runtime:** Python 3.11+ run via `uv`; stdlib-only (no runtime + dependencies). The `dev` group pulls `pytest`, `ruff`, and `mypy`. +- **CLIs:** None beyond the runtime. +- **Credentials / auth:** None. +- **Network:** Runs fully offline; reads local specs, skills, and tool + metadata from the repository checkout. + +## Usage + +```bash +uv run --project tools/spec-inventory spec-inventory +uv run --project tools/spec-inventory spec-inventory --brief --max-where 1 --max-validation 1 --max-gaps 1 +uv run --project tools/spec-inventory spec-inventory --json +``` + +## Run tests + +```bash +uv run --project tools/spec-inventory --group dev pytest tools/spec-inventory/tests +``` diff --git a/tools/spec-inventory/pyproject.toml b/tools/spec-inventory/pyproject.toml new file mode 100644 index 00000000..a31ec917 --- /dev/null +++ b/tools/spec-inventory/pyproject.toml @@ -0,0 +1,75 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "spec-inventory" +version = "0.1.0" +description = "Generate a compact routing inventory for the spec-loop prompts." +readme = "README.md" +requires-python = ">=3.11" +license = { text = "Apache-2.0" } +dependencies = [] + +[project.scripts] +spec-inventory = "spec_inventory:main" + +[tool.hatch.build.targets.wheel] +packages = ["src/spec_inventory"] + +[tool.ruff] +line-length = 110 +target-version = "py311" +src = ["src", "tests"] + +[tool.ruff.lint] +select = ["E", "W", "F", "I", "B", "UP", "SIM", "C4", "RUF"] +ignore = ["E501"] + +[tool.ruff.lint.per-file-ignores] +"tests/**" = ["B", "SIM"] + +[tool.mypy] +python_version = "3.11" +files = ["src", "tests"] +warn_unused_ignores = true +warn_redundant_casts = true +warn_unreachable = true +check_untyped_defs = true +no_implicit_optional = true +disallow_untyped_defs = true +disallow_incomplete_defs = true + +[[tool.mypy.overrides]] +module = "tests.*" +disallow_untyped_defs = false +disallow_incomplete_defs = false + +[dependency-groups] +dev = [ + "mypy>=2.1.0", + "pytest>=9.1.1", + "ruff>=0.15.18", +] + +[tool.pytest.ini_options] +minversion = "8.0" +addopts = "-ra -q" +testpaths = ["tests"] diff --git a/tools/spec-inventory/src/spec_inventory/__init__.py b/tools/spec-inventory/src/spec_inventory/__init__.py new file mode 100644 index 00000000..03851a24 --- /dev/null +++ b/tools/spec-inventory/src/spec_inventory/__init__.py @@ -0,0 +1,400 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Generate a compact routing inventory for the spec-loop. + +The inventory is intentionally shallow: enough for an agent to choose the +next relevant files, not enough to replace direct verification. It is +stdlib-only so the loop can run it cheaply through ``uv``. +""" + +from __future__ import annotations + +import argparse +import json +import re +import sys +from dataclasses import asdict, dataclass +from pathlib import Path + +SPECS_DIR = Path("tools/spec-loop/specs") +SKILLS_DIR = Path("skills") +TOOLS_DIR = Path("tools") + +SKIP_SPEC_FILES = frozenset({"README.md", "overview.md"}) +SKIP_TOOL_DIRS = frozenset({"spec-loop"}) + +_HTML_COMMENT_RE = re.compile(r"") +_FENCED_CODE_RE = re.compile(r"^ {0,3}```(?:\w+)?\n([\s\S]*?)^ {0,3}```", re.MULTILINE) +_HEADING_RE = re.compile(r"^##\s+(.+?)\s*$", re.MULTILINE) +_SCRIPT_RE = re.compile(r"^\s*([A-Za-z0-9_.-]+)\s*=\s*\"([^\"]+)\"", re.MULTILINE) +_SECTION_RE = re.compile(r"^\[([^\]]+)\]\s*$") +_YAML_BLOCK_SCALAR_HEADERS = frozenset({"|", ">", "|-", "|+", ">-", ">+"}) + + +@dataclass +class SpecSummary: + file: str + title: str + status: str + mode: str + kind: str + where: list[str] + validation: list[str] + known_gaps: list[str] + + +@dataclass +class SkillSummary: + file: str + name: str + mode: str + capability: str + organization: str + description: str + + +@dataclass +class ToolSummary: + path: str + has_pyproject: bool + has_tests: bool + scripts: list[str] + + +@dataclass +class Inventory: + specs: list[SpecSummary] + skills: list[SkillSummary] + tools: list[ToolSummary] + + +def find_repo_root(start: Path) -> Path: + """Return the nearest parent containing .git.""" + path = start.resolve() + while path != path.parent: + if (path / ".git").exists(): + return path + path = path.parent + raise RuntimeError(f"Could not find repo root (.git) from {start}") + + +def _frontmatter_bounds(text: str) -> tuple[int, int] | None: + idx = text.find("---\n") + if idx == -1: + return None + if _HTML_COMMENT_RE.sub("", text[:idx]).strip(): + return None + try: + end = text.index("\n---\n", idx + 4) + except ValueError: + return None + return (idx + 4, end) + + +def parse_frontmatter(text: str) -> dict[str, str]: + """Parse top-level YAML-ish frontmatter without external dependencies.""" + bounds = _frontmatter_bounds(text) + if bounds is None: + return {} + block = text[bounds[0] : bounds[1]] + result: dict[str, str] = {} + current_key: str | None = None + current_value_lines: list[str] = [] + + for raw_line in block.splitlines(): + line = raw_line.rstrip() + if line == "": + if current_key is not None: + current_value_lines.append("") + continue + if not line.startswith((" ", "\t")) and ":" in line: + if current_key is not None: + result[current_key] = "\n".join(current_value_lines).strip() + key, _, value = line.partition(":") + current_key = key.strip() + inline = value.strip() + current_value_lines = [inline] if inline and inline not in _YAML_BLOCK_SCALAR_HEADERS else [] + continue + if current_key is not None: + stripped = line[2:] if line.startswith(" ") else line.strip() + current_value_lines.append(stripped) + + if current_key is not None: + result[current_key] = "\n".join(current_value_lines).strip() + return result + + +def _body_after_frontmatter(text: str) -> str: + bounds = _frontmatter_bounds(text) + if bounds is None: + return text + return text[bounds[1] + 5 :] + + +def get_section_body(text: str, section: str) -> str: + """Return a ## section body, or an empty string.""" + body = _body_after_frontmatter(text) + headings = list(_HEADING_RE.finditer(body)) + for index, match in enumerate(headings): + if match.group(1).strip() != section: + continue + start = match.end() + end = headings[index + 1].start() if index + 1 < len(headings) else len(body) + return body[start:end].strip() + return "" + + +def _compact_line(line: str) -> str: + return " ".join(line.strip().split()) + + +def _truncate(text: str, limit: int = 180) -> str: + if len(text) <= limit: + return text + return text[: limit - 1].rstrip() + "…" + + +def section_bullets(text: str, section: str, limit: int) -> list[str]: + """Extract top-level bullets from a section.""" + body = get_section_body(text, section) + bullets: list[str] = [] + current: list[str] = [] + + def flush_current() -> None: + if current and len(bullets) < limit: + bullets.append(_truncate(_compact_line(" ".join(current)))) + current.clear() + + for line in body.splitlines(): + if len(bullets) >= limit: + break + stripped = line.strip() + if stripped.startswith(("- ", "* ")): + flush_current() + current.append(stripped[2:]) + continue + if current and stripped and not stripped.startswith(("```", "#")): + current.append(stripped) + flush_current() + return bullets + + +def validation_commands(text: str, limit: int) -> list[str]: + """Extract compact command lines from fenced blocks in Validation.""" + body = get_section_body(text, "Validation") + commands: list[str] = [] + for block in _FENCED_CODE_RE.findall(body): + for line in block.splitlines(): + stripped = line.strip() + if not stripped or stripped.startswith("#"): + continue + commands.append(_truncate(_compact_line(stripped), 220)) + if len(commands) >= limit: + return commands + return commands + + +def first_description_line(description: str) -> str: + for line in description.splitlines(): + compact = _compact_line(line) + if compact: + return compact + return "" + + +def compact_metadata_value(value: str) -> str: + """Compact scalar or simple YAML-list frontmatter values for one-line output.""" + parts: list[str] = [] + for line in value.splitlines(): + stripped = line.strip() + if not stripped: + continue + if stripped.startswith("- "): + stripped = stripped[2:].strip() + parts.append(stripped) + return ", ".join(parts) + + +def load_specs(repo_root: Path, *, max_where: int, max_validation: int, max_gaps: int) -> list[SpecSummary]: + specs_dir = repo_root / SPECS_DIR + entries: list[SpecSummary] = [] + for path in sorted(specs_dir.glob("*.md")): + if path.name in SKIP_SPEC_FILES: + continue + text = path.read_text() + fm = parse_frontmatter(text) + if not fm: + continue + entries.append( + SpecSummary( + file=str(path.relative_to(repo_root)), + title=fm.get("title", path.stem), + status=fm.get("status", ""), + mode=fm.get("mode", ""), + kind=fm.get("kind", ""), + where=section_bullets(text, "Where it lives", max_where), + validation=validation_commands(text, max_validation), + known_gaps=section_bullets(text, "Known gaps", max_gaps), + ) + ) + return entries + + +def load_skills(repo_root: Path) -> list[SkillSummary]: + skills_dir = repo_root / SKILLS_DIR + entries: list[SkillSummary] = [] + for path in sorted(skills_dir.glob("*/SKILL.md")): + text = path.read_text() + fm = parse_frontmatter(text) + if not fm.get("name"): + continue + entries.append( + SkillSummary( + file=str(path.relative_to(repo_root)), + name=fm.get("name", path.parent.name), + mode=compact_metadata_value(fm.get("mode", "")), + capability=compact_metadata_value(fm.get("capability", "")), + organization=compact_metadata_value(fm.get("organization", "")), + description=first_description_line(fm.get("description", "")), + ) + ) + return entries + + +def parse_project_scripts(pyproject_text: str) -> list[str]: + """Return script names from the [project.scripts] table only.""" + scripts: list[str] = [] + in_scripts = False + for line in pyproject_text.splitlines(): + section_match = _SECTION_RE.match(line.strip()) + if section_match: + in_scripts = section_match.group(1) == "project.scripts" + continue + if not in_scripts: + continue + script_match = _SCRIPT_RE.match(line) + if script_match: + scripts.append(script_match.group(1)) + return scripts + + +def load_tools(repo_root: Path) -> list[ToolSummary]: + tools_dir = repo_root / TOOLS_DIR + entries: list[ToolSummary] = [] + for path in sorted(p for p in tools_dir.iterdir() if p.is_dir() and not p.name.startswith(".")): + if path.name in SKIP_TOOL_DIRS: + continue + pyproject = path / "pyproject.toml" + readme = path / "README.md" + tests = path / "tests" + if not pyproject.exists() and not readme.exists() and not tests.exists(): + continue + scripts: list[str] = [] + if pyproject.exists(): + scripts = parse_project_scripts(pyproject.read_text()) + entries.append( + ToolSummary( + path=str(path.relative_to(repo_root)), + has_pyproject=pyproject.exists(), + has_tests=tests.is_dir(), + scripts=scripts[:5], + ) + ) + return entries + + +def build_inventory(repo_root: Path, *, max_where: int, max_validation: int, max_gaps: int) -> Inventory: + return Inventory( + specs=load_specs(repo_root, max_where=max_where, max_validation=max_validation, max_gaps=max_gaps), + skills=load_skills(repo_root), + tools=load_tools(repo_root), + ) + + +def _join(items: list[str]) -> str: + return "; ".join(items) if items else "-" + + +def format_markdown(inventory: Inventory, *, brief: bool = False) -> str: + lines = [ + "## Compact repository inventory", + "", + "Deterministic routing aid generated from local files. Use it to choose", + "what to inspect next, but verify claims with direct file reads or code", + "search before recording behaviour as present or absent.", + "", + "### Specs", + "", + ] + for spec in inventory.specs: + lines.append(f"- `{spec.file}` — {spec.title} [{spec.status}, {spec.mode}, {spec.kind}]") + lines.append(f" where: {_join(spec.where)}") + lines.append(f" validation: {_join(spec.validation)}") + lines.append(f" known gaps: {_join(spec.known_gaps)}") + lines.extend(["", "### Skills", ""]) + for skill in inventory.skills: + org = f", {skill.organization}" if skill.organization else "" + summary = f"- `{skill.file}` — {skill.name} [{skill.mode}, {skill.capability}{org}]" + if not brief and skill.description: + summary += f": {skill.description}" + lines.append(summary) + lines.extend(["", "### Tools", ""]) + for tool in inventory.tools: + markers = [] + if tool.has_pyproject: + markers.append("pyproject") + if tool.has_tests: + markers.append("tests") + script_suffix = f"; scripts: {', '.join(tool.scripts)}" if tool.scripts else "" + lines.append(f"- `{tool.path}` — {', '.join(markers) if markers else 'metadata'}{script_suffix}") + lines.append("") + return "\n".join(lines) + + +def format_json(inventory: Inventory) -> str: + return json.dumps(asdict(inventory), indent=2) + + +def main() -> None: + parser = argparse.ArgumentParser(description="Generate compact routing inventory for spec-loop prompts.") + parser.add_argument( + "--repo-root", type=Path, default=None, help="Repository root (default: nearest .git parent)." + ) + parser.add_argument("--json", dest="as_json", action="store_true", help="Emit JSON instead of Markdown.") + parser.add_argument("--brief", action="store_true", help="Omit skill descriptions from Markdown output.") + parser.add_argument("--max-where", type=int, default=3, help="Max Where-it-lives bullets per spec.") + parser.add_argument("--max-validation", type=int, default=3, help="Max validation commands per spec.") + parser.add_argument("--max-gaps", type=int, default=2, help="Max known-gap bullets per spec.") + args = parser.parse_args() + + try: + repo_root = args.repo_root.resolve() if args.repo_root else find_repo_root(Path.cwd()) + except RuntimeError as exc: + print(f"error: {exc}", file=sys.stderr) + sys.exit(1) + + inventory = build_inventory( + repo_root, + max_where=args.max_where, + max_validation=args.max_validation, + max_gaps=args.max_gaps, + ) + if args.as_json: + print(format_json(inventory)) + else: + print(format_markdown(inventory, brief=args.brief), end="") diff --git a/tools/spec-inventory/tests/test_spec_inventory.py b/tools/spec-inventory/tests/test_spec_inventory.py new file mode 100644 index 00000000..0f44443e --- /dev/null +++ b/tools/spec-inventory/tests/test_spec_inventory.py @@ -0,0 +1,227 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from __future__ import annotations + +import json +from pathlib import Path + +from spec_inventory import ( + build_inventory, + compact_metadata_value, + first_description_line, + format_json, + format_markdown, + parse_frontmatter, + parse_project_scripts, + validation_commands, +) + + +def test_parse_frontmatter_handles_block_scalars_and_comments() -> None: + text = """ + +--- +name: magpie-example +mode: Triage +description: | + First line. + Second line. +capability: capability:triage +license: Apache-2.0 +--- +""" + + fm = parse_frontmatter(text) + + assert fm["name"] == "magpie-example" + assert fm["mode"] == "Triage" + assert fm["description"] == "First line.\nSecond line." + assert fm["capability"] == "capability:triage" + + +def test_validation_commands_extracts_non_comment_lines() -> None: + text = """--- +title: Example +status: experimental +kind: feature +mode: infra +--- + +## Validation + +```bash +# comment +bash -n tools/spec-loop/loop.sh +uv run --project tools/spec-validator spec-validate +``` +""" + + assert validation_commands(text, 2) == [ + "bash -n tools/spec-loop/loop.sh", + "uv run --project tools/spec-validator spec-validate", + ] + + +def test_first_description_line_skips_blank_lines() -> None: + assert first_description_line("\n\n Useful sentence.\nMore.") == "Useful sentence." + + +def test_compact_metadata_value_handles_yaml_list() -> None: + assert compact_metadata_value("- capability:triage\n- capability:review") == ( + "capability:triage, capability:review" + ) + + +def test_parse_project_scripts_only_reads_scripts_table() -> None: + pyproject = """[project] +name = "example" +version = "0.1.0" + +[project.scripts] +example-tool = "example:main" + +[tool.ruff] +line-length = 100 +""" + + assert parse_project_scripts(pyproject) == ["example-tool"] + + +def test_build_inventory_summarizes_specs_skills_and_tools(tmp_path: Path) -> None: + specs_dir = tmp_path / "tools" / "spec-loop" / "specs" + specs_dir.mkdir(parents=True) + (specs_dir / "example.md").write_text( + """ + +--- +title: Example Spec +status: experimental +kind: feature +mode: infra +source: tests +acceptance: + - It works. +--- + +# Example Spec + +## What it does + +Thing. + +## Where it lives + +- `tools/example/` + +## Behaviour & contract + +Thing. + +## Out of scope + +Thing. + +## Acceptance criteria + +1. Thing. + +## Validation + +```bash +uv run --project tools/example pytest +``` + +## Known gaps + +- Needs more tests. +""" + ) + (specs_dir / "README.md").write_text("# skip\n") + + skill_dir = tmp_path / "skills" / "example" + skill_dir.mkdir(parents=True) + (skill_dir / "SKILL.md").write_text( + """--- +name: magpie-example +mode: Triage +description: | + Summarize an example. +capability: capability:triage +license: Apache-2.0 +--- +""" + ) + + tool_dir = tmp_path / "tools" / "example" + tool_dir.mkdir(parents=True) + (tool_dir / "pyproject.toml").write_text( + """[project.scripts] +example-tool = "example:main" +""" + ) + (tool_dir / "tests").mkdir() + + inventory = build_inventory(tmp_path, max_where=2, max_validation=2, max_gaps=2) + + assert inventory.specs[0].title == "Example Spec" + assert inventory.specs[0].where == ["`tools/example/`"] + assert inventory.specs[0].validation == ["uv run --project tools/example pytest"] + assert inventory.specs[0].known_gaps == ["Needs more tests."] + assert inventory.skills[0].name == "magpie-example" + assert inventory.skills[0].capability == "capability:triage" + assert inventory.tools[0].path == "tools/example" + assert inventory.tools[0].scripts == ["example-tool"] + + +def test_formatters_emit_markdown_and_json(tmp_path: Path) -> None: + specs_dir = tmp_path / "tools" / "spec-loop" / "specs" + specs_dir.mkdir(parents=True) + (specs_dir / "example.md").write_text( + """--- +title: Example +status: stable +kind: feature +mode: infra +source: tests +acceptance: + - It works. +--- + +## Where it lives +- `tools/example/` +## Validation +```bash +bash -n tools/example/run.sh +``` +## Known gaps +- None. +""" + ) + (tmp_path / "skills").mkdir() + (tmp_path / "tools" / "example").mkdir(parents=True) + (tmp_path / "tools" / "example" / "README.md").write_text("# Example\n") + + inventory = build_inventory(tmp_path, max_where=1, max_validation=1, max_gaps=1) + markdown = format_markdown(inventory) + brief_markdown = format_markdown(inventory, brief=True) + parsed = json.loads(format_json(inventory)) + + assert "Compact repository inventory" in markdown + assert "`tools/spec-loop/specs/example.md`" in markdown + assert "Example" in brief_markdown + assert parsed["specs"][0]["title"] == "Example" diff --git a/tools/spec-loop/AGENTS.md b/tools/spec-loop/AGENTS.md index b34ba318..aa582613 100644 --- a/tools/spec-loop/AGENTS.md +++ b/tools/spec-loop/AGENTS.md @@ -30,6 +30,9 @@ Run the spec's own **Validation** block first. General checks: # Validate skill definitions (frontmatter, links, placeholders) uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate +# Validate the compact inventory helper +uv run --project tools/spec-inventory --group dev pytest tools/spec-inventory/tests + # A skill's behavioural eval suite (every skill must have one) uv run --project tools/skill-evals skill-eval tools/skill-evals/evals// diff --git a/tools/spec-loop/PROMPT_build.md b/tools/spec-loop/PROMPT_build.md index 353c5f28..41558c56 100644 --- a/tools/spec-loop/PROMPT_build.md +++ b/tools/spec-loop/PROMPT_build.md @@ -10,6 +10,8 @@ Context to load first: commands, branch + hard-limit rules). The repo-wide `/AGENTS.md` also applies (commit trailers, placeholder convention, confidentiality). - `tools/spec-loop/IMPLEMENTATION_PLAN.md` — the prioritised work items. +- The appended **Compact repository inventory** block from the runner — + use it to route to likely specs/source files before opening full files. - The appended **Open pull-request context** block from the runner. - The appended **Local work-item branches** block from the runner. The loop never pushes, so a work item it already built shows up here, not in @@ -19,7 +21,11 @@ Context to load first: Steps: -1. Read the appended **Open pull-request context** and **Local work-item +1. Read the appended **Compact repository inventory**. Use it as a routing + aid for selecting likely relevant specs, skills, tools, and validation + commands. It is not proof; verify the selected work item against the + plan and exact source files before changing anything. +2. Read the appended **Open pull-request context** and **Local work-item branches**. Treat both open PRs and existing local work-item branches as in-flight work. Pick the single highest-priority work item from `IMPLEMENTATION_PLAN.md`. If a **Tooling source** block is appended @@ -30,32 +36,32 @@ Steps: not already built as a local work-item branch (the loop never pushes, so a built item lives only as a local branch until a human pushes it). One only. -2. **Create its branch off the integration base**, then switch to it: +3. **Create its branch off the integration base**, then switch to it: `git checkout -b ` where `` is the work item's branch — the bare slug, **no `spec/` or other prefix** (e.g. `pairing-self-review`). NEVER commit the work to the integration branch. One branch per work item. -3. Read only the relevant spec file(s) — from the control branch if a +4. Read only the relevant spec file(s) — from the control branch if a **Tooling source** block is appended, otherwise from the working tree — plus the relevant `.claude/skills/` / `tools/` / `docs/` files from the working tree. Confirm what already exists before writing — do not assume. -4. Implement the work item **completely** — no placeholders, no stubs. +5. Implement the work item **completely** — no placeholders, no stubs. Skills: follow the skill format (frontmatter `name` / `description` / `license`, SPDX header, placeholder convention, every state change a confirmed proposal) **and ship an eval suite** under `tools/skill-evals/evals//` exercising each step with fixture cases (per `/AGENTS.md` § Reusable skills — a skill without a matching eval suite is incomplete). Tools: ship tests. -5. Run the work item's **Validation** command(s) from its spec (the +6. Run the work item's **Validation** command(s) from its spec (the backpressure). Fix until they pass. -6. Specs and `IMPLEMENTATION_PLAN.md` live on the control branch. If a +7. Specs and `IMPLEMENTATION_PLAN.md` live on the control branch. If a **Tooling source** block is appended, they are **not** on this work branch — do not create or edit them here; instead note any `status` or `Known gap` change in the PR body for a later plan/update beat to reconcile. (If no such block is present, the tooling is on this branch: update **only that spec's** frontmatter/Known-gaps, and never `IMPLEMENTATION_PLAN.md`.) -7. `git add -A` then `git commit` with an imperative subject and a +8. `git add -A` then `git commit` with an imperative subject and a `Generated-by: Claude (Opus 4.7)` trailer. **Never** add a `Co-Authored-By:` trailer for an agent. diff --git a/tools/spec-loop/PROMPT_plan.md b/tools/spec-loop/PROMPT_plan.md index 91489543..fef55c10 100644 --- a/tools/spec-loop/PROMPT_plan.md +++ b/tools/spec-loop/PROMPT_plan.md @@ -11,30 +11,37 @@ Context to load first: applies. - `tools/spec-loop/specs/*` — the functional description of the product. - `tools/spec-loop/IMPLEMENTATION_PLAN.md` (if present; may be stale). +- The appended **Compact repository inventory** block from the runner — + use it as the first routing map before opening full files. - The appended **Open pull-request context** block from the runner. - The appended **Local work-item branches** block from the runner. Built but un-pushed work items live here, not in the PR context. Steps: -1. Study each spec in `tools/spec-loop/specs/` and compare it against the +1. Read the appended **Compact repository inventory**. Use it to identify + the likely relevant specs, skills, tools, validation commands, and + known gaps before opening full files. The inventory is a routing aid, + not proof: before recording a gap or declaring one closed, confirm with + a code search or direct file read. +2. Study each spec in `tools/spec-loop/specs/` and compare it against the actual code it names in **Where it lives** (`.claude/skills/`, `tools/`, `docs/`). You may use parallel subagents for reading. Do NOT assume something is missing — confirm with a code search first. -2. Read the appended **Open pull-request context** and **Local work-item +3. Read the appended **Open pull-request context** and **Local work-item branches**. Treat both open PRs and existing local work-item branches as in-flight work. If an apparent gap is already substantially covered by an open PR (including draft PRs) or already built on a local work-item branch, do not add it as a planned work item. The loop never pushes, so a built item may exist only as a local branch with no PR yet. -3. For each spec, identify the **gaps**: a `proposed` area with no skill, +4. For each spec, identify the **gaps**: a `proposed` area with no skill, a documented step that drifted from the code, a missing test, a `Known gaps` item. Each gap is a candidate work item. -4. Rewrite `tools/spec-loop/IMPLEMENTATION_PLAN.md` as a prioritised list +5. Rewrite `tools/spec-loop/IMPLEMENTATION_PLAN.md` as a prioritised list of work items. Each work item names: the change, the spec it serves, its **Validation** command, and a branch slug (``, the bare slug — **no `spec/` or other prefix, no numbers**). -5. Do NOT create work items against an `off` spec (e.g. Agentic Autonomous) — +6. Do NOT create work items against an `off` spec (e.g. Agentic Autonomous) — that would skip the proof MISSION requires. Rules: diff --git a/tools/spec-loop/PROMPT_update.md b/tools/spec-loop/PROMPT_update.md index c0f5b21e..6c3d31e2 100644 --- a/tools/spec-loop/PROMPT_update.md +++ b/tools/spec-loop/PROMPT_update.md @@ -13,6 +13,8 @@ Context to load first: - `tools/spec-loop/AGENTS.md` and the repo-wide `/AGENTS.md`. - `tools/spec-loop/specs/*` — the current functional description. - The actual code: `.claude/skills/`, `tools/`, `docs/`, `docs/modes.md`. +- The appended **Compact repository inventory** block from the runner — + use it as the first routing map before opening full files. Steps: @@ -23,19 +25,23 @@ Steps: that commit. If the diff is empty, exit without creating a branch or commit (print "specs already in sync as of "). If no previous sync commit is recorded, fall through to a full inventory. -2. **Create a uniquely-named sync branch off the integration base**, then +2. Read the appended **Compact repository inventory**. Use it to identify + likely spec/source relationships and validation commands before opening + full files. The inventory is a routing aid, not proof; confirm with a + code search before recording something as present or absent. +3. **Create a uniquely-named sync branch off the integration base**, then switch to it: `git checkout -b "sync-specs-$(date +%Y%m%d-%H%M%S)"`. A fresh branch every run keeps each sync as its own reviewable PR and never collides with or commits on top of a previous `sync-specs*` branch. Note the exact name you created — you will print it in the human-run commands below. Never commit the sync to the integration branch. -3. Inventory the code with parallel subagents (full inventory only if +4. Inventory the code with parallel subagents (full inventory only if step 1 did not narrow the surface): - every `.claude/skills/*/SKILL.md` (name, mode, what it does); - every `tools/*` project (what it does, its tests); - the mode/status table in `docs/modes.md`. -4. Diff that inventory against `tools/spec-loop/specs/`: +5. Diff that inventory against `tools/spec-loop/specs/`: - **New functionality with no spec** → author a new topic-named spec (no number prefix) following the format in [`specs/README.md`](specs/README.md), grounded in the real code it @@ -47,9 +53,9 @@ Steps: are reflected). - **Removed functionality** → mark the spec or move it to a `Known gaps`/retired note; do not silently delete history. -5. Update `specs/overview.md` and `specs/README.md` indexes if areas were +6. Update `specs/overview.md` and `specs/README.md` indexes if areas were added or renamed. -6. `git add -A` then `git commit` with subject +7. `git add -A` then `git commit` with subject `docs(spec-loop): sync specs with contributed functionality` and a `Generated-by: Claude (Opus 4.7)` trailer. **Do NOT touch `tools/spec-loop/.last-sync` yourself** — `loop.sh` amends the marker diff --git a/tools/spec-loop/README.md b/tools/spec-loop/README.md index ea7c566c..ae18f24d 100644 --- a/tools/spec-loop/README.md +++ b/tools/spec-loop/README.md @@ -80,6 +80,7 @@ See the SECURITY notes in [`loop.sh`](loop.sh). | [`AGENTS.md`](AGENTS.md) | Loop-scoped operational rules (repo map, validation commands, branch + hard-limit rules). | | `PROMPT_plan.md` / `PROMPT_build.md` / `PROMPT_update.md` / `PROMPT_consolidate.md` | The per-beat prompts. | | `loop.sh` | The runner. | +| `../spec-inventory/` | Deterministic compact inventory helper appended to prompts as a routing aid. | ## Modes diff --git a/tools/spec-loop/loop.sh b/tools/spec-loop/loop.sh index 5a36809e..f2150fd8 100755 --- a/tools/spec-loop/loop.sh +++ b/tools/spec-loop/loop.sh @@ -335,6 +335,25 @@ local_branch_context() { done <<< "$branches" } +compact_inventory_context() { + echo "" + if ! command -v uv >/dev/null 2>&1; then + echo "## Compact repository inventory" + echo "" + echo "- unavailable: uv CLI not found on PATH." + return 0 + fi + + local inventory + if inventory="$(uv run --project tools/spec-inventory spec-inventory --brief --max-where 1 --max-validation 1 --max-gaps 1 2>/dev/null)"; then + printf '%s\n' "$inventory" + else + echo "## Compact repository inventory" + echo "" + echo "- unavailable: spec-inventory failed. Fall back to direct file reads." + fi +} + while true; do if [ "$MAX_ITERATIONS" -gt 0 ] && [ "$ITERATION" -ge "$MAX_ITERATIONS" ]; then echo "Reached max iterations: $MAX_ITERATIONS"; break @@ -387,6 +406,7 @@ while true; do echo "Error: could not read '$ACTIVE_PROMPT' from the working tree or control branch '$TOOLING_REF'." >&2 rm -f "$PROMPT_WITH_CONTEXT"; break fi + compact_inventory_context >> "$PROMPT_WITH_CONTEXT" # Update mode just diffs code against specs; it doesn't pick a work item, so # the open-PR list (a network round-trip via gh) buys nothing. Skip it there. if [ "$MODE" != "update" ]; then diff --git a/tools/spec-loop/specs/meta-and-quality-tooling.md b/tools/spec-loop/specs/meta-and-quality-tooling.md index cec48c7d..64fa70dd 100644 --- a/tools/spec-loop/specs/meta-and-quality-tooling.md +++ b/tools/spec-loop/specs/meta-and-quality-tooling.md @@ -46,6 +46,9 @@ trustworthy as it grows. - `tools/spec-status-index/` — deterministic `uv` tool that reads `tools/spec-loop/specs/` and prints specs grouped by status; used by build iterations to mechanically select the next work item. +- `tools/spec-inventory/` — deterministic `uv` tool that summarizes + specs, skills, and tools into a compact routing inventory for spec-loop + prompts. - `tools/spec-validator/` — validates spec-loop spec frontmatter (required keys, valid `status`/`kind`/`mode` values, body-section presence, `Known gaps` section required in functional specs, @@ -100,12 +103,15 @@ trustworthy as it grows. implementation are explicitly marked reserved or future. 6. `docs/modes.md` skill lists and shipped counts are checked against live skill frontmatter. +7. `spec-inventory` emits a compact, deterministic routing map for specs, + skills, and tools, and has its own tests. ## Validation ```bash uv run --project tools/skill-and-tool-validator --group dev pytest uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate +uv run --project tools/spec-inventory --group dev pytest tools/spec-inventory/tests ``` ## Known gaps diff --git a/uv.lock b/uv.lock index 95a9303e..c99e4b6c 100644 --- a/uv.lock +++ b/uv.lock @@ -34,6 +34,7 @@ members = [ "security-tracker-stats-dashboard", "skill-and-tool-validator", "skill-evals", + "spec-inventory", "spec-status-index", "spec-validator", "symlink-lint", @@ -1127,6 +1128,27 @@ dev = [ { name = "ruff", specifier = ">=0.15.18" }, ] +[[package]] +name = "spec-inventory" +version = "0.1.0" +source = { editable = "tools/spec-inventory" } + +[package.dev-dependencies] +dev = [ + { name = "mypy" }, + { name = "pytest" }, + { name = "ruff" }, +] + +[package.metadata] + +[package.metadata.requires-dev] +dev = [ + { name = "mypy", specifier = ">=2.1.0" }, + { name = "pytest", specifier = ">=9.1.1" }, + { name = "ruff", specifier = ">=0.15.18" }, +] + [[package]] name = "spec-status-index" version = "0.1.0"