diff --git a/AGENTS.md b/AGENTS.md index 21f91b62..9e6c7cde 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -332,6 +332,18 @@ yields. When this document says a value comes from `/project.md`, read it as "from `project.md`, else the project's organization, else the framework default". +A project may also **pull skills from a trusted external source**. The +committed `/skill-sources.md` file is the install gate: it +lists the source ids the adopter trusts and commits each pin (method + URL ++ ref + verification anchor). Where a skill directory would sit, a +`skills//source.md` redirect (frontmatter `source:` / `organization:` +/ `skill_path:` / `evals_path:`, **not** a `SKILL.md`) names the source; +`/magpie-setup` fetches it into the gitignored snapshot and wires it in like +a framework skill. Per [`PRINCIPLES.md` §13](PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies) +this is the one external home that *installs* rather than being merely +referenced — pinned, verified, and adopter-vouched. See +[`docs/skill-sources/`](docs/skill-sources/README.md). + ### Placeholder convention used in skill files Skill files, tool-adapter docs, and this file use a small set of diff --git a/PRINCIPLES.md b/PRINCIPLES.md index 0539cfec..2d02ac9f 100644 --- a/PRINCIPLES.md +++ b/PRINCIPLES.md @@ -124,7 +124,7 @@ Skills, tool adapters, and root docs use `` / `` / ` ## 13. Snapshot plus override, never vendored copies -Adopters consume the framework as a gitignored snapshot at `.apache-magpie/`, pinned via a committed lock file, refreshed by one skill (`setup`). Project-specific modifications live as agent-readable markdown under `/.apache-magpie-overrides/`, committed. No git submodules. No vendored copies of framework skills inside adopter repos. Marketplaces, indexes, and catalogs may exist for discovery, never for installation. +Adopters consume the framework as a gitignored snapshot at `.apache-magpie/`, pinned via a committed lock file, refreshed by one skill (`setup`). Project-specific modifications live as agent-readable markdown under `/.apache-magpie-overrides/`, committed. No git submodules. No vendored copies of framework skills inside adopter repos. Marketplaces, indexes, and catalogs exist for discovery. Installation is permitted only from a **trusted source** — an external organization or repository the adopter has explicitly vouched for by committing its pin (method + URL + ref + verification anchor) to the repo. Everything else stays discovery-only. A trusted install obeys the same snapshot-plus-pin discipline as the framework itself: a gitignored snapshot, a committed lock, a verified and deliberate fetch by the one `setup` skill — never a git submodule, and never an unpinned or unverified auto-fetch. See [`docs/skill-sources/`](docs/skill-sources/README.md) for the trusted-skill-source mechanism. ## 14. Skills are the unit of authorship diff --git a/README.md b/README.md index f89ee4aa..8802355e 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ - [Subsequent contributors](#subsequent-contributors) - [Drift detection](#drift-detection) - [Skill families](#skill-families) + - [External skill sources](#external-skill-sources) - [Maintenance](#maintenance) - [Cross-references](#cross-references) @@ -176,6 +177,19 @@ means and which modes are still proposed vs. shipping today. | [**repo-health**](docs/repo-health/README.md) | Triage | Read-only repository-health audits: obsolete runner labels, Actions workflow security, dependency vulnerabilities, license/NOTICE compliance, flaky-test patterns. | 5 skills, [`docs/repo-health/`](docs/repo-health/) | | **utilities** | (meta) | Framework meta-skills: author or update skills (`write-skill`), restructure existing skills (`optimize-skill`), print a live index of all available skills (`list-skills`). | 3 skills | +### External skill sources + +Beyond the in-tree families, an adopter can pull a skill or whole family +from a **trusted external source** — a repo other than `apache/magpie` that +ships Magpie-shaped skills (with their evals and tests). Where a skill +directory would sit, a `skills//source.md` **redirect** names a +pinned, verified source the adopter has vouched for; `/magpie-setup` fetches +it into the gitignored snapshot and wires it in exactly like a framework +skill. Nothing is fetched unless the adopter commits the pin — see +[`docs/skill-sources/`](docs/skill-sources/README.md), +[`PRINCIPLES.md` §13](PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies), +and [`RFC-AI-0006`](docs/rfcs/RFC-AI-0006.md). + ## Maintenance After the initial adoption, the same skill handles ongoing diff --git a/docs/adapters/registry.md b/docs/adapters/registry.md index 09885a74..feeb8cef 100644 --- a/docs/adapters/registry.md +++ b/docs/adapters/registry.md @@ -28,6 +28,12 @@ the ones that ship in-tree, the open extension points, and links to > your `/project.md` (or `organizations//`) at it, > exactly as you would a built-in one. An external link is a pointer for > humans to evaluate, not a supply-chain hook. +> +> This applies to the *adapter/organization* index on this page. Skills +> are different: §13 permits **installing** a skill or skill-family from a +> **trusted** external source — pinned, verified, and adopter-vouched. +> That mechanism and its own discovery index live under +> [`docs/skill-sources/`](../skill-sources/README.md). To author a new adapter, see [`authoring.md`](authoring.md). diff --git a/docs/extending.md b/docs/extending.md index 22322b77..3b634b2e 100644 --- a/docs/extending.md +++ b/docs/extending.md @@ -34,6 +34,7 @@ indexes and catalogs exist for discovery, not installation. | Entity | What it is | Reference | |---|---|---| | **Skill** | a workflow the agent follows | [`PRINCIPLES.md` §14](../PRINCIPLES.md#14-skills-are-the-unit-of-authorship), [`write-skill`](../skills/write-skill/SKILL.md) | +| **Skill source** | a trusted external repo a skill/family is *pulled* from | [`docs/skill-sources/`](skill-sources/README.md), [`RFC-AI-0006`](rfcs/RFC-AI-0006.md) | | **Tool / tool adapter** | the only layer that knows a vendor — a backend behind a capability contract | [vendor-neutrality § Tool adapters](vendor-neutrality.md#tool-adapters), [`adapters/authoring.md`](adapters/authoring.md) | | **Capability contract** | the stable verb set a skill depends on; the seam adapters plug into | [`tools/cve-tool/`](../tools/cve-tool/) and siblings | | **Organization** | governance vocabulary + backend bundle + identity, shared by an org's projects | [`organizations/README.md`](../organizations/README.md) | @@ -65,6 +66,13 @@ discovery. - **Skills** — framework skills come from the snapshot; project tweaks live in `.apache-magpie-overrides/.md` (consulted at run time); a wholly new skill you keep can live in your repo's agent-skill dir. + A skill or skill-family maintained in **another repo** can also be + *pulled in* from a **trusted external source** — a `skills//source.md` + redirect that names a pinned, verified source the adopter has vouched + for, fetched into the snapshot and wired in like a framework skill. This + is the one external home that *installs* rather than merely being + referenced (see [`PRINCIPLES.md` §13](../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies) + and [`docs/skill-sources/`](skill-sources/README.md)). - **Tools / adapters** — selected per capability in `/project.md` *Tools enabled*. The selected adapter may be an in-tree `tools//`, a directory you keep in your adopter diff --git a/docs/index.md b/docs/index.md index 5c51d26f..caca446b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -121,6 +121,7 @@ from mailing lists, slack etc. | Understand the full vision | [MISSION.md](../MISSION.md) | | Understand how it stays vendor-neutral | [vendor-neutrality.md](vendor-neutrality.md) | | Find or author a backend adapter | [adapters/registry.md](adapters/registry.md) | +| Pull a skill/family from a trusted external source | [skill-sources/README.md](skill-sources/README.md) | | Extend Magpie (project / org / individual) | [extending.md](extending.md) | | See what skills exist today | [modes.md](modes.md) | | Adopt in my project | [README → Adopting](../README.md#adopting-the-framework) | diff --git a/docs/rfcs/RFC-AI-0006.md b/docs/rfcs/RFC-AI-0006.md new file mode 100644 index 00000000..524bfc72 --- /dev/null +++ b/docs/rfcs/RFC-AI-0006.md @@ -0,0 +1,256 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [RFC-AI-0006: Trusted external skill sources](#rfc-ai-0006-trusted-external-skill-sources) + - [Abstract](#abstract) + - [Status of this document](#status-of-this-document) + - [Motivation](#motivation) + - [Proposal](#proposal) + - [The three-layer trust model](#the-three-layer-trust-model) + - [Source descriptor](#source-descriptor) + - [Pointer file — the redirect](#pointer-file--the-redirect) + - [Fetch, verify, pin](#fetch-verify-pin) + - [Symlink and eval binding](#symlink-and-eval-binding) + - [Amending PRINCIPLES §13](#amending-principles-13) + - [Security model](#security-model) + - [Drawbacks](#drawbacks) + - [Alternatives considered](#alternatives-considered) + - [Out of scope](#out-of-scope) + - [References](#references) + + + + + +# RFC-AI-0006: Trusted external skill sources + +## Abstract + +Every Magpie skill ships in one repository — `apache/magpie` — and reaches +adopters through one mechanism: the [`setup`](../../skills/setup/SKILL.md) +skill downloads the whole framework into a gitignored snapshot, pins it in a +committed lock, and symlinks the selected skills into agent dirs. There is +no way to pull an individual skill or skill-family from a **different** +repository or organization. The "External (another repo)" home in +[`docs/extending.md`](../extending.md) exists only as a "vendor it in by +hand" note, and [`PRINCIPLES.md` §13](../../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies) +forbade installation from anything but the one framework snapshot. + +This RFC introduces **trusted external skill sources**: a "redirect" +pointer where a skill directory would sit, naming a remote source (a GitHub +folder, an SVN/`dist` archive, or a git tag/branch) from which the skill and +**all its related files — including evals and tests** — are fetched, pinned, +verified, and wired in so the skill behaves identically to an in-tree one. +It amends §13 to permit installation from a **trusted** source — one the +adopter has explicitly vouched for — while keeping the snapshot-plus-pin +discipline, and adds a per-organization curation layer so the owning +repository/organization of every source is explicit. + +## Status of this document + +**Proposed.** The design lands in phases: **Phase A** — this RFC, the §13 +amendment, the descriptor/pointer formats under +[`docs/skill-sources/`](../skill-sources/README.md), the org/project +`skill-sources.md` files, and `skill-and-tool-validator` support — is the +review checkpoint. **Phase B** — the `setup`-skill fetch/lock/symlink wiring +(`.apache-magpie.sources.lock`, `/magpie-setup skill-sources`, and the +`adopt`/`upgrade`/`verify` integration) plus a worked example — follows. + +## Motivation + +The framework is deliberately one skill-authorship boundary +([§14](../../PRINCIPLES.md#14-skills-are-the-unit-of-authorship)) with one +distribution channel. That is right for the core, but it blocks three real +needs already visible in the extension model: + +1. **Sub-project and podling skills.** An ASF sub-project or incubating + podling may maintain skills specific to itself that should not live in + `apache/magpie` yet still be adoptable by its own repos with the same + ergonomics as a framework skill. +2. **Organization-private skills.** A company or collective running Magpie + across many repos wants to maintain a shared skill-family in one place + and pull it into each repo — without vendoring copies or a submodule. +3. **Community skills.** A third party maintains a useful skill-family and + others want to adopt it deliberately, pinned and verified, not by + copy-paste. + +Today all three fall back to "clone it in by hand" — unpinned, unverified, +and invisible to drift detection, `verify`, and the eval discipline. The +machinery to do this *properly* already exists for the framework itself: a +verified fetch (git tag/branch, or `svn-zip` with SHA-512 + GPG), a +committed pin, a two-lock drift model, and the canonical-plus-relay symlink +wiring. This RFC generalizes that machinery from **one** source to **N named +trusted** sources. + +The one blocker is principle, not plumbing: §13 said catalogs are "for +discovery, never for installation." This RFC narrows that to "never for +*untrusted* installation" — an adopter-vouched, pinned, verified source is +as safe to install as the framework snapshot, because it *is* the same +mechanism. + +## Proposal + +### The three-layer trust model + +Trust is layered so an organization can curate candidates while the adopter +keeps the final say. Nothing is fetched until the adopter opts in. + +| Layer | File | Home | Role | +|---|---|---|---| +| **Discovery** | [`docs/skill-sources/registry.md`](../skill-sources/registry.md) | in-tree | Editorial index of known sources. Lists, never installs. | +| **Org-curated** | `organizations//skill-sources.md` | in-tree / adopter-local org override | An org vouches for candidate sources its projects may adopt. | +| **Adopter opt-in** | `/skill-sources.md` | committed in the adopter repo | **The install gate.** Lists the trusted source ids and commits each pin. Only sources here are fetched. | + +This mirrors the existing `project → organization → framework` precedence +([`AGENTS.md`](../../AGENTS.md#configuration-resolution-order)): an org +curates a default set; the adopter overrides — trusting a source the org did +not curate, or declining one it did. + +### Source descriptor + +A descriptor identifies one source and enumerates what it `provides`, +reusing the install-method and lock vocabulary the framework snapshot +already uses: + +```yaml +id: # unique, kebab-case +organization: # owning org; must name a directory under organizations/ +name: "" +maintainer: "" +method: +url: +ref: +# verification anchor: commit (git-tag) | sha512 (svn-zip) +layout: + skills_root: skills + evals_root: tools/skill-evals/evals +provides: + - skill: + - family: -* +``` + +### Pointer file — the redirect + +Where a skill directory would sit, `skills//source.md` names the +source. It is the "redirect link"; the skill body, evals, and tests are +fetched into the gitignored snapshot, not committed here. It is named +`source.md` — **not** `SKILL.md` — so the validator's `SKILL.md`-gated +checks (required frontmatter, name convention, injection guard) do not fire +on a stub. Its frontmatter uses `source:` (already an allowed optional key) +plus `organization:`, `skill_path:`, and `evals_path:`. Full format in +[`docs/skill-sources/README.md`](../skill-sources/README.md#pointer-file--the-redirect). + +### Fetch, verify, pin + +`/magpie-setup skill-sources` (and the source pass folded into adoption) +reads `/skill-sources.md`, then for each trusted source +fetches into `.apache-magpie-sources//` (gitignored) reusing the +framework [install recipes](../setup/install-recipes.md) verbatim — `git +clone --depth=1 --branch ` for git methods, download + `sha512sum -c` + +optional `gpg --verify` for `svn-zip`. Two locks record the result, exactly +as for the framework snapshot: + +- **`.apache-magpie.sources.lock`** (committed) — per-source pin + (`method`/`url`/`ref` + `commit`|`sha512`), keyed by `id`. +- **`.apache-magpie.sources.local.lock`** (gitignored) — per-source fetch + fingerprint (`source_*`, `fetched_commit`, `fetched_at`). + +Drift detection, `upgrade`, and `verify` extend to these locks with the same +logic already used for the framework snapshot. + +### Symlink and eval binding + +For each provided skill, the canonical + relay symlinks are created exactly +as for framework skills — `.agents/skills/magpie-` → +`../../.apache-magpie-sources//skills//`, with per-agent relays +back through the canonical entry (`symlink-lint`'s no-cycles + +relay-through-canonical invariants hold unchanged). Because a fetch pulls +both the `skills/` tree and the `tools/skill-evals/evals/` tree, the eval +suite's directory-name + `skill_md:`-path binding resolves after the fetch, +so a pulled skill is eval-able and testable exactly as in its home repo. +The one requirement on a source repo is the two-tree layout, declared in the +descriptor's `layout:` block. + +### Amending PRINCIPLES §13 + +§13's final sentence changes from "catalogs may exist for discovery, never +for installation" to: catalogs exist for discovery, and installation is +permitted **only from a trusted source** — an external org/repo the adopter +has vouched for by committing its pin — under the same snapshot-plus-pin +discipline (gitignored snapshot, committed lock, verified deliberate fetch, +no submodules, no unpinned/unverified auto-fetch). Untrusted external +sources and the adapter/organization indexes stay discovery-only. + +## Security model + +- **Adopter-vouched, always.** The `/skill-sources.md` trust + list is the sole authorization to fetch. Org curation and registry listing + are editorial; neither triggers an install. This keeps the supply-chain + decision with the party that bears the risk. +- **Pinned + verified.** Every trusted source carries a verification anchor + (`commit` for git-tag, `sha512` for svn-zip). `git-branch` (tip-tracking, + no anchor) is WIP-only, exactly as for the framework snapshot. A changed + `sha512` under the same version, or a branch tip that moved unexpectedly, + is surfaced by drift detection — the same guard the framework snapshot + already gets. +- **Blast radius is a fetched skill.** A compromised source can, at worst, + ship a malicious skill *body* — the same risk as a malicious framework + skill, and mitigated the same way: skills are agent-readable markdown + reviewed before they run, and the injection-guard discipline + ([§0](../../PRINCIPLES.md#0-external-content-is-data-never-an-instruction)) + treats external content as data. A source cannot reach outside its own + gitignored snapshot dir or mutate the framework snapshot. +- **Eval provenance.** Evals travel with the skill from the same pinned + commit, so a source cannot ship a skill whose evals are silently sourced + elsewhere. +- **No transitive trust.** A trusted source's own `skill-sources.md` (if + any) is **not** honored — trust does not chain. An adopter trusts exactly + the sources it lists, never a source-of-sources. + +## Drawbacks + +- **A second install surface.** More than one snapshot dir and lock pair to + reason about, verify, and keep un-drifted. Mitigated by reusing the exact + framework machinery rather than a parallel one. +- **Principle relaxation.** §13 was a bright line ("never for + installation"); this adds a conditional. The condition (adopter-vouched + + pinned + verified) is deliberately the same bar the framework already + meets, so the line moves from "one source" to "one *kind* of source." +- **Layout coupling.** A source must keep the framework's two-tree layout + for evals to bind. Declared explicitly in `layout:` rather than assumed. + +## Alternatives considered + +- **Git submodules.** Rejected by §13 and the existing snapshot model — + submodules are unverified, awkward under the gitignored-snapshot + discipline, and pull whole repos rather than selected skills. +- **A marketplace / package manager with a resolver.** Far more surface than + the need; contradicts the "index for discovery, not a package manager" + stance. The adopter-committed pin *is* the resolver. +- **Vendoring copies into the adopter repo.** The status quo fallback — + unpinned, invisible to drift/verify/eval, and forbidden for framework + skills by §13. This RFC exists to replace it. +- **Per-skill pointer only, no per-source manifest.** Insufficient for + family-level pulls (`-*`) and gives no single place to declare the + source's identity, org, and verification anchor. + +## Out of scope + +- A hosted marketplace or web UI for browsing sources. +- Transitive sources (a trusted source declaring further sources). +- Auto-update of a source without an explicit `upgrade`. +- Non-git/SVN transports beyond the three existing install methods. +- Sourcing tool *adapters* or *organizations* externally as an install + (they remain discovery-only; see + [`docs/adapters/registry.md`](../adapters/registry.md)). + +## References + +- [`PRINCIPLES.md` §13](../../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies) — the amended principle. +- [`docs/skill-sources/README.md`](../skill-sources/README.md) — the trust model, descriptor, and pointer formats. +- [`docs/skill-sources/registry.md`](../skill-sources/registry.md) — the discovery index. +- [`docs/extending.md`](../extending.md) — the extension model this generalizes. +- [`skills/setup/SKILL.md`](../../skills/setup/SKILL.md) — the adopt/upgrade/verify flow and the framework snapshot lock model. +- [`docs/setup/install-recipes.md`](../setup/install-recipes.md) — the fetch/verify recipes reused per source. diff --git a/docs/skill-sources/README.md b/docs/skill-sources/README.md new file mode 100644 index 00000000..b4b0f04c --- /dev/null +++ b/docs/skill-sources/README.md @@ -0,0 +1,194 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Trusted external skill sources](#trusted-external-skill-sources) + - [The trust model — three layers](#the-trust-model--three-layers) + - [Source descriptor](#source-descriptor) + - [Pointer file — the redirect](#pointer-file--the-redirect) + - [How a trusted skill is installed](#how-a-trusted-skill-is-installed) + - [Layout contract — skills, evals, tests](#layout-contract--skills-evals-tests) + - [Security model](#security-model) + - [Discovery index](#discovery-index) + - [See also](#see-also) + + + + + +# Trusted external skill sources + +A **skill source** is a repository — other than `apache/magpie` — that +ships Magpie-shaped skills (and their evals and tests). This page defines +how an adopter pulls a skill or whole skill-family from such a source and +wires it in so it behaves **exactly like an in-tree skill**: same +`magpie-`-prefixed symlink relay, same override layer, same eval binding. + +Per [`PRINCIPLES.md` §13](../../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies), +installation is permitted **only from a *trusted* source** — one the +adopter has explicitly vouched for by committing its pin (method + URL + +ref + verification anchor) to the repo. A trusted install obeys the same +snapshot-plus-pin discipline the framework uses for itself: a gitignored +snapshot, a committed lock, a verified and deliberate fetch by the one +[`setup`](../../skills/setup/SKILL.md) skill — never a git submodule, and +never an unpinned or unverified auto-fetch. The full rationale and threat +model are in [`RFC-AI-0006`](../rfcs/RFC-AI-0006.md). + +## The trust model — three layers + +Trust is layered so an organization can *curate* candidate sources while +the *adopter* keeps the final say. Nothing is fetched until the adopter +opts in. + +| Layer | File | Home | Role | +|---|---|---|---| +| **Discovery** | [`registry.md`](registry.md) | in-tree | The framework's index of known sources — curated **and** community. Editorial only; lists a source, never installs it. | +| **Org-curated** | `organizations//skill-sources.md` | in-tree (or adopter-local org override) | An organization vouches for a set of sources its projects may draw from. Inherited by naming `organization: `. Still not an install. | +| **Adopter opt-in** | `/skill-sources.md` | committed in the adopter repo | **The install gate.** The adopter lists the source ids it trusts and commits each pin. *Only sources listed here are ever fetched.* | + +An adopter may trust a source their org did **not** curate (list it +directly with a full descriptor), or decline one the org did. The org +layer is a convenience default, never a mandate — the same +`project → organization → framework` precedence the rest of the config +model uses (see [`AGENTS.md`](../../AGENTS.md#configuration-resolution-order)). + +## Source descriptor + +A **descriptor** identifies one source and enumerates what it `provides`. +It appears in the org-curated file and/or the adopter opt-in file; the +[registry](registry.md) links to the canonical one. Fields reuse the +[install-method](../setup/install-recipes.md) and lock vocabulary the +framework snapshot already uses, so resolution is mechanical. + +```yaml +id: # unique, kebab-case — the handle pointers reference +organization: # owning org; must name a directory under organizations/ +name: "" +maintainer: "" +method: # same three install methods as the framework +url: +ref: +# Verification anchor — the re-fetch guard, per method: +# git-tag : commit: +# svn-zip : sha512: +# git-branch has no cryptographic anchor — it tracks the branch tip +layout: # where things live inside the source repo + skills_root: skills # default: skills + evals_root: tools/skill-evals/evals # default: tools/skill-evals/evals +provides: + - skill: # one unprefixed skill directory name + - family: -* # or a family prefix — pulls every skill matching it +``` + +`method`, `url`, `ref`, and the per-method anchor are exactly the keys the +framework's own [`.apache-magpie.lock`](../../skills/setup/SKILL.md) carries; +`svn-zip` is the only method with cryptographic verification (SHA-512 + +optional GPG against the source's `KEYS`), `git-tag` pins a resolved +`commit`, and `git-branch` tracks a tip (WIP only, no frozen anchor). + +## Pointer file — the redirect + +Where a skill directory would sit, a **pointer file** names its source. +It is the "redirect link": the skill body, evals, and tests are **not** +committed here — they are fetched into the gitignored snapshot at +adopt/upgrade time. The file is `skills//source.md` (deliberately +**not** `SKILL.md`, so the skill validator's `SKILL.md`-gated checks do +not fire on a stub). + +```markdown +--- +source: # references a descriptor above +organization: # must name a directory under organizations/ +skill_path: skills/ # subpath of the skill within the source repo +evals_path: tools/skill-evals/evals/ # subpath of its eval suite +--- + + + +# — redirect to a trusted external source + +This skill is provided by the trusted external source `` +(`organization: `). Its `SKILL.md`, eval suite, and tests are fetched +into the gitignored snapshot at `.apache-magpie-sources//` by +`/magpie-setup` and symlinked in exactly like an in-tree skill. This file +is a pointer only — do not add skill logic here; contribute it to the +source repo instead. +``` + +`source:` is already an allowed optional key in the skill validator's +frontmatter set, so nothing about the pointer is a special case for the +common-path validation — only the additional pointer-specific checks in +[the validator](../../tools/skill-and-tool-validator/) apply (the `source:` +resolves to a known descriptor; the `organization:` is a known org; the +directory draws no eval-coverage advisory because its evals are external). + +## How a trusted skill is installed + +The [`setup`](../../skills/setup/SKILL.md) skill drives the fetch (see +`/magpie-setup skill-sources`). In outline: + +1. Read `/skill-sources.md` — the trust list. Sources not + listed there are never fetched. +2. For each trusted source, **fetch + verify** into + `.apache-magpie-sources//` (gitignored) reusing the framework + [install recipes](../setup/install-recipes.md) verbatim — `git clone + --depth=1 --branch ` for git methods; download + `sha512sum -c` + + optional `gpg --verify` for `svn-zip`. +3. Record the pins: committed `.apache-magpie.sources.lock` (per-source + `method`/`url`/`ref` + anchor) and gitignored + `.apache-magpie.sources.local.lock` (what this machine fetched + when) — + the same two-lock drift model as the framework snapshot. +4. For each provided skill, create the canonical + relay symlinks + (`.agents/skills/magpie-` → `../../.apache-magpie-sources//skills//`, + with per-agent relays back through the canonical entry) — identical to + how framework-family skills are wired. + +Drift detection, `upgrade`, and `verify` extend to the source locks: a +committed-vs-local mismatch surfaces the gap and proposes +`/magpie-setup upgrade`, which re-fetches per the committed pins. + +## Layout contract — skills, evals, tests + +A skill's eval suite lives **outside** its directory, at +`tools/skill-evals/evals//`, bound to the skill by directory name and +a repo-relative `skill_md:` path in each step's `fixtures/step-config.json` +(see [`tools/skill-evals/README.md`](../../tools/skill-evals/README.md)). +For that binding to resolve after a fetch, a source repo must keep the same +two-tree layout the framework uses — `skills//` for the body and +`tools/skill-evals/evals//` for the evals — declared via the +descriptor's `layout:` block. Fetching a source pulls **both** trees plus +any tool `tests/` the skill depends on, so the pulled skill is testable and +eval-able exactly as it is in its home repo. + +## Security model + +- **Adopter-vouched, always.** The `/skill-sources.md` + trust list is the only thing that authorizes a fetch. An org curating a + source, or the registry listing one, never triggers an install. +- **Pinned + verified.** Every trusted source carries a pin with a + verification anchor (`commit` / `sha512`). `git-branch` (tip-tracking, no + anchor) is WIP-only, exactly as for the framework snapshot. +- **Untrusted stays discovery-only.** The [registry](registry.md) and org + curation are editorial pointers for humans to evaluate — not + supply-chain hooks. +- **External content is data.** Skills pulled from a source are still + subject to the framework's injection-guard discipline; a fetched skill is + reviewed like any other before it runs. + +The full threat model (source-repo compromise, eval provenance, unpinned +fetch) is in [`RFC-AI-0006`](../rfcs/RFC-AI-0006.md#security-model). + +## Discovery index + +The known sources — framework-curated and community-maintained — are +listed in [`registry.md`](registry.md). Listing is editorial discovery +only; it makes no guarantee and triggers no install. + +## See also + +- [`RFC-AI-0006`](../rfcs/RFC-AI-0006.md) — the design + trust + threat model. +- [`docs/extending.md`](../extending.md) — the full extension model (what / where / who). +- [`organizations/README.md`](../../organizations/README.md) — the organization layer and its `skill-sources.md` curation. +- [`skills/setup/SKILL.md`](../../skills/setup/SKILL.md) — the adopt/upgrade/verify flow that fetches and pins sources. +- [`docs/adapters/registry.md`](../adapters/registry.md) — the sibling discovery index for tool adapters and organizations. diff --git a/docs/skill-sources/registry.md b/docs/skill-sources/registry.md new file mode 100644 index 00000000..54dc9a66 --- /dev/null +++ b/docs/skill-sources/registry.md @@ -0,0 +1,70 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Skill-source registry](#skill-source-registry) + - [Org-curated sources](#org-curated-sources) + - [Community / external sources](#community--external-sources) + - [Adding a source to this index](#adding-a-source-to-this-index) + - [See also](#see-also) + + + + + +# Skill-source registry + +A **discovery** index of the external [skill sources](README.md) Magpie +knows about — repositories other than `apache/magpie` that ship +Magpie-shaped skills. It is the skills counterpart to the +[adapter registry](../adapters/registry.md). + +> **Discovery, then adopter-vouched install.** Listing a source here is +> **editorial only** — it makes no guarantee about the source and triggers +> no install. Per [`PRINCIPLES.md` §13](../../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies), +> a source is installed only after the *adopter* trusts it explicitly by +> committing its pin to `/skill-sources.md`. An entry here +> is a pointer for humans to evaluate, not a supply-chain hook. See +> [`README.md`](README.md) for the trust model and the descriptor format. + +## Org-curated sources + +Sources an organization vouches for, declared in +`organizations//skill-sources.md` and inherited by projects that set +`organization: `. Curation is still not installation — the adopter +opts each one in. + +| Organization | Curated sources | File | +|---|---|---| +| Apache Software Foundation | *(none listed yet)* | [`organizations/ASF/skill-sources.md`](../../organizations/ASF/skill-sources.md) | +| Independent (no formal governing body) | *(none listed yet)* | [`organizations/independent/skill-sources.md`](../../organizations/independent/skill-sources.md) | + +## Community / external sources + +Sources maintained **outside** any in-tree organization curation — kept in +their authors' own repos and linked here for discovery. An adopter wires +one in by writing a full descriptor into their +`/skill-sources.md` (see the trust model above); the +framework never fetches them unprompted. + +| Source id | Owning org | Maintainer | Repository | Notes | +|---|---|---|---|---| +| *(none listed yet)* | | | | Open a PR to add a row — see below. | + +### Adding a source to this index + +Open a PR against `apache/magpie` that adds one row with: the source id, +the owning organization (a directory under `organizations/`), the +maintainer, a link to the source repository, and a one-line note. If the +source belongs to an in-tree organization, add its descriptor to that +org's `skill-sources.md` and reference it from the *Org-curated* table +instead. Listing here is **editorial discovery only** — it makes no +guarantee and triggers no install. + +## See also + +- [`README.md`](README.md) — the trust model, descriptor, and pointer-file formats. +- [`RFC-AI-0006`](../rfcs/RFC-AI-0006.md) — design and threat model. +- [`docs/adapters/registry.md`](../adapters/registry.md) — the sibling discovery index for tool adapters and organizations. +- [`docs/extending.md`](../extending.md) — the full extension model. diff --git a/docs/vendor-neutrality.md b/docs/vendor-neutrality.md index c69c9405..55b3dca2 100644 --- a/docs/vendor-neutrality.md +++ b/docs/vendor-neutrality.md @@ -259,9 +259,11 @@ organization profile — you author one, and you have two supported paths: [discovery index](adapters/registry.md) of in-tree and community-maintained adapters — but, per [`PRINCIPLES.md` §13](../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies), - an index is **for discovery, never for installation**: nothing is + the adapter index is **for discovery, never for installation**: nothing is auto-fetched, and you wire an external adapter in deliberately, exactly - as you would a built-in one. + as you would a built-in one. (Trusted external *skill* sources are the + one installable exception §13 carves out — pinned, verified, and + adopter-vouched; see [`docs/skill-sources/`](skill-sources/README.md).) Either way the skills stay agnostic: they target the capability, and your adapter — wherever it lives — supplies the backend. The same three homes diff --git a/organizations/ASF/skill-sources.md b/organizations/ASF/skill-sources.md new file mode 100644 index 00000000..691e3930 --- /dev/null +++ b/organizations/ASF/skill-sources.md @@ -0,0 +1,29 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Apache Software Foundation — curated skill sources](#apache-software-foundation--curated-skill-sources) + - [Curated sources](#curated-sources) + + + + + +# Apache Software Foundation — curated skill sources + +The external [skill sources](../../docs/skill-sources/README.md) the ASF +organization **vouches for**. A project that sets `organization: ASF` sees +these as candidate sources it *may* adopt — curation is **not** +installation. The project still opts each one in by committing its pin to +[`/skill-sources.md`](../../projects/_template/skill-sources.md). + +## Curated sources + +*(none listed yet)* — the ASF organization curates no external skill +sources at this time. Every skill an ASF project runs today ships in-tree +in `apache/magpie`. When the ASF vouches for a source (for example a +sub-project or incubating-podling skill repo), add its +[descriptor](../../docs/skill-sources/README.md#source-descriptor) here and +a row to the *Org-curated sources* table in +[`docs/skill-sources/registry.md`](../../docs/skill-sources/registry.md). diff --git a/organizations/README.md b/organizations/README.md index 66677ae7..768a6dd0 100644 --- a/organizations/README.md +++ b/organizations/README.md @@ -7,6 +7,7 @@ - [Why this exists](#why-this-exists) - [Resolution order](#resolution-order) - [What ships here](#what-ships-here) + - [Curated skill sources](#curated-skill-sources) - [Authoring a new organization](#authoring-a-new-organization) @@ -90,6 +91,20 @@ not branch on the organization. | [`independent/`](independent/) | The **no-formal-organization** baseline — DCO sign-off, GitHub-native security/releases, no mailing-list/forwarder/metadata backends. Used by [`projects/non-asf-example/`](../projects/non-asf-example/). | | [`_template/`](_template/) | Authoring skeleton for a **new** organization. | +## Curated skill sources + +An organization may also **vouch for external skill sources** — repos other +than `apache/magpie` that ship Magpie-shaped skills its projects may adopt. +These are listed in `organizations//skill-sources.md` (see +[`_template/skill-sources.md`](_template/skill-sources.md)). Curation is +**not** installation: a project under the organization still opts each +source in by committing its pin to `/skill-sources.md`, the +[install gate](../docs/skill-sources/README.md#the-trust-model--three-layers). +The full mechanism — descriptor format, pointer files, and the pinned + +verified fetch — lives in [`docs/skill-sources/`](../docs/skill-sources/README.md) +([`PRINCIPLES.md` §13](../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies), +[`RFC-AI-0006`](../docs/rfcs/RFC-AI-0006.md)). + ## Authoring a new organization Copy [`_template/`](_template/) to fill in the governance vocabulary, the diff --git a/organizations/_template/README.md b/organizations/_template/README.md index 5c0baeb8..ec7a49f5 100644 --- a/organizations/_template/README.md +++ b/organizations/_template/README.md @@ -20,7 +20,11 @@ Authoring skeleton for a new [organization](../README.md). framework default. 3. Point a project at it: set `organization: ` in the project's `/project.md`. -4. Optionally **contribute it upstream** to `apache/magpie` under +4. Optionally list the external skill sources your organization vouches + for in [`skill-sources.md`](skill-sources.md) — curation only; projects + still opt each one in. See + [`docs/skill-sources/`](../../docs/skill-sources/README.md). +5. Optionally **contribute it upstream** to `apache/magpie` under Apache-2.0 so every project in your organization reuses it, or keep it local. See [`docs/vendor-neutrality.md` § Authoring your own adapter](../../docs/vendor-neutrality.md#authoring-your-own-adapter). diff --git a/organizations/_template/skill-sources.md b/organizations/_template/skill-sources.md new file mode 100644 index 00000000..2eada28e --- /dev/null +++ b/organizations/_template/skill-sources.md @@ -0,0 +1,49 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [TODO: `` — curated skill sources](#todo-organization-name--curated-skill-sources) + - [Curated sources](#curated-sources) + + + + + +# TODO: `` — curated skill sources + +The external [skill sources](../../docs/skill-sources/README.md) this +organization **vouches for**. A project that sets `organization: ` +sees these as candidate sources it *may* adopt — curation is **not** +installation. The project still opts each one in by committing its pin to +[`/skill-sources.md`](../../projects/_template/skill-sources.md). + +Leave the list empty if the organization curates none; projects can still +trust a source directly in their own `skill-sources.md`. + +## Curated sources + +Each entry is a [source descriptor](../../docs/skill-sources/README.md#source-descriptor). +Declare only sources this organization stands behind for every project +under it. + +```yaml +# - id: # unique, kebab-case +# organization: # must match this directory's name +# name: "" +# maintainer: "" +# method: +# url: +# ref: +# # verification anchor: commit (git-tag) | sha512 (svn-zip) +# layout: +# skills_root: skills +# evals_root: tools/skill-evals/evals +# provides: +# - skill: +# - family: -* +``` + +Also add a row to the *Org-curated sources* table in +[`docs/skill-sources/registry.md`](../../docs/skill-sources/registry.md) +so the source is discoverable. diff --git a/organizations/independent/skill-sources.md b/organizations/independent/skill-sources.md new file mode 100644 index 00000000..a926dbba --- /dev/null +++ b/organizations/independent/skill-sources.md @@ -0,0 +1,25 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [Independent — curated skill sources](#independent--curated-skill-sources) + - [Curated sources](#curated-sources) + + + + + +# Independent — curated skill sources + +The `independent` organization is the no-formal-governing-body baseline, so +it **curates no skill sources** — there is no governing body to vouch on a +project's behalf. A project under `organization: independent` (the default) +that wants an external skill trusts the source **directly** by writing its +full [descriptor](../../docs/skill-sources/README.md#source-descriptor) into +its own [`/skill-sources.md`](../../projects/_template/skill-sources.md). + +## Curated sources + +*(none — by design)*. See [`docs/skill-sources/README.md`](../../docs/skill-sources/README.md) +for how an adopter trusts a source without organization curation. diff --git a/projects/_template/README.md b/projects/_template/README.md index ad410b28..f68b23c3 100644 --- a/projects/_template/README.md +++ b/projects/_template/README.md @@ -15,6 +15,7 @@ - [Issue management](#issue-management) - [Repo-health audits](#repo-health-audits) - [PR triage and review](#pr-triage-and-review) + - [External skill sources](#external-skill-sources) - [Recommended setup order](#recommended-setup-order) - [Checklist after copying](#checklist-after-copying) - [Cross-references](#cross-references) @@ -156,6 +157,12 @@ fill them in. > values, which new adopters can use as a reference when drafting > their own configuration). +### External skill sources + +| File | Purpose | +|---|---| +| [`skill-sources.md`](skill-sources.md) | **The install gate** for pulling skills/families from trusted external repos. Lists the source ids this project trusts and commits each pin. `/magpie-setup` fetches only what is listed here. Leave empty to run only in-tree framework skills. See [`docs/skill-sources/`](../../docs/skill-sources/README.md). | + ## Recommended setup order After copying the template, fill in the core project files before the diff --git a/projects/_template/project.md b/projects/_template/project.md index fb1236ac..5f79b4a0 100644 --- a/projects/_template/project.md +++ b/projects/_template/project.md @@ -505,4 +505,5 @@ product: - [`fix-workflow.md`](fix-workflow.md) — fork / toolchain / commit-trailer specifics. - [`naming-conventions.md`](naming-conventions.md) — project-specific editorial rules. - [`canned-responses.md`](canned-responses.md) — reporter-facing reply templates. +- [`skill-sources.md`](skill-sources.md) — trusted external skill sources this project pulls skills from (the install gate). - [`README.md`](README.md) — project file index + onboarding checklist. diff --git a/projects/_template/skill-sources.md b/projects/_template/skill-sources.md new file mode 100644 index 00000000..a23c25f5 --- /dev/null +++ b/projects/_template/skill-sources.md @@ -0,0 +1,72 @@ + + +**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +- [TODO: `` — trusted skill sources](#todo-project-name--trusted-skill-sources) + - [Trusted sources](#trusted-sources) + - [Selecting what to pull](#selecting-what-to-pull) + + + + + +# TODO: `` — trusted skill sources + +**This file is the install gate.** It lists the external +[skill sources](../../docs/skill-sources/README.md) this project trusts and +pins. Per [`PRINCIPLES.md` §13](../../PRINCIPLES.md#13-snapshot-plus-override-never-vendored-copies), +`/magpie-setup` fetches a source **only if it is listed here** — an +organization curating a source, or the registry listing one, never triggers +an install on its own. Committing this file is the adopter's explicit act of +vouching for each source. + +Leave the list empty (the default) to run only in-tree framework skills. + +## Trusted sources + +Each entry is a [source descriptor](../../docs/skill-sources/README.md#source-descriptor). +You may **reference** a source your organization curated in +`organizations//skill-sources.md` by `id` alone (the descriptor is +inherited), or declare a full descriptor here for a source your org does +not curate. Either way the pin — `method` + `url` + `ref` + the per-method +verification anchor — must be committed. + +```yaml +# Reference an org-curated source by id (descriptor inherited from +# organizations//skill-sources.md), still pinning the ref you trust: +# - id: +# ref: +# commit: # git-tag anchor (or sha512: for svn-zip) + +# ...or trust a source your organization does not curate, in full: +# - id: +# organization: # must name a directory under organizations/ +# name: "" +# maintainer: "" +# method: +# url: +# ref: +# commit: # git-tag; or sha512: for svn-zip +# layout: +# skills_root: skills +# evals_root: tools/skill-evals/evals +# provides: +# - skill: +# - family: -* +``` + +`git-branch` (branch-tip tracking, no cryptographic anchor) is WIP-only — +prefer `git-tag` or `svn-zip` for anything you depend on, exactly as for the +framework snapshot pin in [`.apache-magpie.lock`](../../skills/setup/SKILL.md). + +## Selecting what to pull + +Running `/magpie-setup skill-sources` (or the source pass folded into +`/magpie-setup` adoption) reads this file, fetches + verifies each trusted +source into the gitignored `.apache-magpie-sources//`, writes the +committed [`.apache-magpie.sources.lock`](../../docs/skill-sources/README.md#how-a-trusted-skill-is-installed) +pin, and symlinks in the `provides` skills exactly like framework-family +skills. From then on each pulled skill behaves like an in-tree one — same +`magpie-` relay, same override layer under `.apache-magpie-overrides/`, same +eval binding. diff --git a/tools/skill-and-tool-validator/src/skill_and_tool_validator/__init__.py b/tools/skill-and-tool-validator/src/skill_and_tool_validator/__init__.py index fce2c731..b9b504b6 100644 --- a/tools/skill-and-tool-validator/src/skill_and_tool_validator/__init__.py +++ b/tools/skill-and-tool-validator/src/skill_and_tool_validator/__init__.py @@ -227,6 +227,24 @@ CAPABILITY_SYNC_CATEGORY = "capability-sync" # Eval-coverage check: every skill must have a matching eval suite. EVAL_COVERAGE_CATEGORY = "eval-coverage" + +# Trusted-external-skill-source checks (HARD). A `skills//source.md` +# pointer redirects a skill to an external source instead of a local +# SKILL.md; source descriptors live in `organizations//skill-sources.md`, +# `docs/skill-sources/*.md`, and `/skill-sources.md`. See +# docs/skill-sources/README.md and RFC-AI-0006. +SKILL_SOURCE_CATEGORY = "skill-source" +SKILL_SOURCE_POINTER_FILE = "source.md" +SKILL_SOURCE_FILENAME = "skill-sources.md" +SKILL_SOURCES_DOCS_DIR = Path("docs/skill-sources") +PROJECTS_DIR = Path("projects") +# Install methods a source pin may use — the same three the framework +# snapshot supports (svn-zip is verified; git-branch tracks a tip). +INSTALL_METHODS = frozenset({"git-tag", "git-branch", "svn-zip"}) +# Frontmatter keys a `source.md` pointer must declare. +REQUIRED_POINTER_KEYS = frozenset({"source", "organization", "skill_path", "evals_path"}) +# Top-level keys a source descriptor must declare. +REQUIRED_DESCRIPTOR_KEYS = frozenset({"id", "organization", "name", "method", "url", "ref", "provides"}) _SKILL_TABLE_HEADER = "## Capability to skill map" _TOOL_TABLE_HEADER = "## Capability to tool map" # Tokens like `capability:triage`, `contract:source-control`, @@ -474,6 +492,7 @@ def _read_mode_table() -> dict[str, str]: NAME_CONVENTION_CATEGORY, LICENSE_HEADER_CATEGORY, STATUS_CATEGORY, + SKILL_SOURCE_CATEGORY, } ) ALL_CATEGORIES = HARD_CATEGORIES | SOFT_CATEGORIES @@ -2613,6 +2632,205 @@ def validate_override_contract(root: Path | None = None) -> Iterable[Violation]: yield from validate_override_file(override_file, text) +# --------------------------------------------------------------------------- +# Trusted external skill sources — pointer + descriptor checks (HARD) +# --------------------------------------------------------------------------- + + +def is_skill_source_pointer(skill_dir: Path) -> bool: + """True when ``skill_dir`` is a trusted-source *pointer* directory — it + carries a ``source.md`` redirect and no local ``SKILL.md`` (the real + SKILL.md is fetched into the snapshot at adopt time).""" + return (skill_dir / SKILL_SOURCE_POINTER_FILE).exists() and not (skill_dir / "SKILL.md").exists() + + +def collect_skill_source_pointers(root: Path | None = None) -> list[Path]: + """Return every ``skills//`` directory that is a source pointer.""" + base = (root or find_repo_root()) / SKILLS_DIR + if not base.exists(): + return [] + return sorted(d for d in base.iterdir() if d.is_dir() and is_skill_source_pointer(d)) + + +def _skill_source_descriptor_files(root: Path) -> list[Path]: + """Return the markdown files that may declare *real* source descriptors: + each organization's and each project's ``skill-sources.md``. The spec + docs under ``docs/skill-sources/`` are excluded — their YAML fences are + illustrative (placeholder-valued) examples, not declarations.""" + files: list[Path] = [] + for base in (root / ORGANIZATIONS_DIR, root / PROJECTS_DIR): + if base.exists(): + files.extend(sorted(base.glob(f"*/{SKILL_SOURCE_FILENAME}"))) + return files + + +def _iter_yaml_fence_lines(text: str) -> Iterable[str]: + """Yield the raw lines inside ```` ```yaml ```` / ```` ```yml ```` fenced + blocks of a markdown document (fence markers excluded).""" + in_fence = False + for raw in text.splitlines(): + stripped = raw.strip() + if not in_fence: + if stripped.startswith("```yaml") or stripped.startswith("```yml"): + in_fence = True + continue + if stripped.startswith("```"): + in_fence = False + continue + yield raw + + +def parse_source_descriptors(text: str) -> list[dict[str, object]]: + """Parse skill-source descriptors from the ```yaml fences of a + skill-sources markdown file. + + Only *uncommented* lines count, so the commented examples in the + template files declare nothing. A descriptor begins at an ``id:`` (or + ``- id:``) line; its top-level scalar keys are captured, and the set of + all keys seen (including nested block headers like ``provides:``) is + kept under ``_keys`` for presence checks. Stdlib-only — no YAML dep, in + keeping with the rest of the validator.""" + descriptors: list[dict[str, object]] = [] + cur: dict[str, object] | None = None + for raw in _iter_yaml_fence_lines(text): + if not raw.strip() or raw.lstrip().startswith("#"): + continue + line = raw.rstrip() + m_id = re.match(r"^\s*(?:-\s+)?id:\s*(\S+)\s*$", line) + if m_id: + if cur is not None: + descriptors.append(cur) + source_id = m_id.group(1).strip().strip("'\"") + # A placeholder id (``) marks an illustrative example, + # not a declaration — ignore it and the lines that follow until + # the next real id. + if "<" in source_id or ">" in source_id: + cur = None + continue + cur = {"id": source_id, "_keys": {"id"}} + continue + if cur is None: + continue + m_kv = re.match(r"^\s*(?:-\s+)?([A-Za-z_][\w-]*):\s*(.*)$", line) + if m_kv: + key = m_kv.group(1) + val = m_kv.group(2).strip().strip("'\"") + keys = cur["_keys"] + assert isinstance(keys, set) + keys.add(key) + if val and key not in cur: + cur[key] = val + if cur is not None: + descriptors.append(cur) + return descriptors + + +def collect_known_source_ids(root: Path | None = None) -> set[str]: + """Return the set of source ids declared across every skill-sources + descriptor file. A ``source.md`` pointer must reference one of these.""" + repo_root = root or find_repo_root() + ids: set[str] = set() + for path in _skill_source_descriptor_files(repo_root): + try: + text = path.read_text(encoding="utf-8") + except OSError: + continue + for desc in parse_source_descriptors(text): + ids.add(str(desc["id"])) + return ids + + +def validate_skill_source_descriptors(root: Path | None = None) -> Iterable[Violation]: + """Validate every declared (uncommented) source descriptor: required + keys present, a supported install ``method``, and a known + ``organization``. Commented template examples declare nothing and are + skipped.""" + repo_root = root or find_repo_root() + orgs = known_organizations(repo_root) + for path in _skill_source_descriptor_files(repo_root): + try: + text = path.read_text(encoding="utf-8") + except OSError: + continue + for desc in parse_source_descriptors(text): + keys = desc["_keys"] + assert isinstance(keys, set) + missing = REQUIRED_DESCRIPTOR_KEYS - keys + for key in sorted(missing): + yield Violation( + path, + 1, + f"skill-source descriptor '{desc.get('id', '?')}' missing required key: '{key}'", + category=SKILL_SOURCE_CATEGORY, + ) + method = desc.get("method") + if method is not None and method not in INSTALL_METHODS: + yield Violation( + path, + 1, + f"skill-source descriptor '{desc.get('id', '?')}' method '{method}' " + f"not in {sorted(INSTALL_METHODS)}", + category=SKILL_SOURCE_CATEGORY, + ) + org = desc.get("organization") + if org is not None and orgs and org not in orgs: + yield Violation( + path, + 1, + f"skill-source descriptor '{desc.get('id', '?')}' organization '{org}' " + f"is not a known organization {sorted(orgs)}", + category=ORGANIZATION_CATEGORY, + ) + + +def validate_skill_source_pointers(root: Path | None = None) -> Iterable[Violation]: + """Validate every ``skills//source.md`` redirect pointer: required + frontmatter keys present, a known ``organization``, and a ``source`` that + resolves to a declared descriptor.""" + repo_root = root or find_repo_root() + orgs = known_organizations(repo_root) + known_ids = collect_known_source_ids(repo_root) + for skill_dir in collect_skill_source_pointers(repo_root): + path = skill_dir / SKILL_SOURCE_POINTER_FILE + try: + text = path.read_text(encoding="utf-8") + except OSError as exc: + yield Violation(path, None, f"cannot read source pointer: {exc}", category=SKILL_SOURCE_CATEGORY) + continue + fm = parse_frontmatter(text) + if fm is None: + yield Violation( + path, + 1, + "source pointer missing YAML frontmatter block (expected '---' at start)", + category=SKILL_SOURCE_CATEGORY, + ) + continue + for key in sorted(REQUIRED_POINTER_KEYS - set(fm.keys())): + yield Violation( + path, 1, f"source pointer missing required key: '{key}'", category=SKILL_SOURCE_CATEGORY + ) + org = fm.get("organization") + if org and orgs and org not in orgs: + yield Violation( + path, + 1, + f"source pointer organization '{org}' is not a known organization {sorted(orgs)} " + f"— add organizations/{org}/ or fix the value", + category=ORGANIZATION_CATEGORY, + ) + src = fm.get("source") + if src and src not in known_ids: + yield Violation( + path, + 1, + f"source pointer references unknown source '{src}' — declare it in an " + f"organizations//{SKILL_SOURCE_FILENAME} or /{SKILL_SOURCE_FILENAME} " + f"descriptor {sorted(known_ids) or '(none declared)'}", + category=SKILL_SOURCE_CATEGORY, + ) + + # --------------------------------------------------------------------------- # Eval-coverage check (check #9, SOFT) # --------------------------------------------------------------------------- @@ -2637,6 +2855,11 @@ def validate_eval_coverage(root: Path | None = None) -> Iterable[Violation]: for skill_dir in sorted(skills_base.iterdir()): if not skill_dir.is_dir(): continue + # A trusted-external-skill-source pointer dir carries its eval suite + # in the source repo, fetched into the snapshot at adopt time — not + # in-tree. Do not demand a local eval suite for it. + if is_skill_source_pointer(skill_dir): + continue slug = skill_dir.name if slug not in eval_slugs: yield Violation( @@ -2867,6 +3090,11 @@ def run_validation(root: Path | None = None) -> list[Violation]: # Eval-coverage check: every skill must have a matching eval suite. violations.extend(validate_eval_coverage(repo_root)) + # Trusted-external-skill-source checks: source.md pointers resolve to a + # declared, well-formed descriptor with a known organization. + violations.extend(validate_skill_source_descriptors(repo_root)) + violations.extend(validate_skill_source_pointers(repo_root)) + # docs/modes.md consistency check: skill lists and counts match live frontmatter. violations.extend(validate_modes_doc_consistency(repo_root)) diff --git a/tools/skill-and-tool-validator/tests/test_validator.py b/tools/skill-and-tool-validator/tests/test_validator.py index 74df5622..0edbba81 100644 --- a/tools/skill-and-tool-validator/tests/test_validator.py +++ b/tools/skill-and-tool-validator/tests/test_validator.py @@ -46,11 +46,13 @@ MAX_METADATA_CHARS, MODES_DOC_CATEGORY, MULTI_CAPABILITY_CATEGORY, + ORGANIZATION_CATEGORY, OVERRIDE_CONTRACT_CATEGORY, OVERRIDES_DIR, PRINCIPLE_CATEGORY, PRIVACY_CATEGORY, SECURITY_PATTERN_CATEGORY, + SKILL_SOURCE_CATEGORY, SOFT_CATEGORIES, STATUS_CATEGORY, TEMPLATE_DRIFT_CATEGORY, @@ -59,16 +61,20 @@ _read_mode_table, collect_doc_files, collect_files_to_check, + collect_known_source_ids, collect_skill_dirs, + collect_skill_source_pointers, collect_tool_python_files, extract_headings, find_repo_root, is_path_allowlisted, is_placeholder_url, + is_skill_source_pointer, known_organizations, line_has_inline_allow_marker, main, parse_frontmatter, + parse_source_descriptors, resolve_link, run_validation, slugify, @@ -91,6 +97,8 @@ validate_privacy_patterns, validate_project_template_drift, validate_security_patterns, + validate_skill_source_descriptors, + validate_skill_source_pointers, validate_tools, validate_trigger_preservation, ) @@ -3938,3 +3946,156 @@ def test_all_violations_are_soft_category(self, tmp_path: Path) -> None: violations = list(validate_project_template_drift(tmp_path)) for v in violations: assert v.category == TEMPLATE_DRIFT_CATEGORY + + +# --------------------------------------------------------------------------- +# Trusted external skill sources — pointers + descriptors (RFC-AI-0006) +# --------------------------------------------------------------------------- + +_REAL_DESCRIPTOR_FENCE = ( + "```yaml\n" + "- id: acme-skills\n" + " organization: ASF\n" + ' name: "Acme Skills"\n' + ' maintainer: "acme"\n' + " method: git-tag\n" + " url: https://github.com/acme/skills\n" + " ref: v1.0.0\n" + " commit: abc123def456\n" + " layout:\n" + " skills_root: skills\n" + " evals_root: tools/skill-evals/evals\n" + " provides:\n" + " - skill: acme-thing\n" + "```\n" +) + +_COMMENTED_DESCRIPTOR_FENCE = ( + "```yaml\n" + "# - id: \n" + "# organization: \n" + "# method: \n" + "```\n" +) + + +def _make_source_repo( + tmp_path: Path, + *, + org: str = "ASF", + descriptor_fence: str = _REAL_DESCRIPTOR_FENCE, + pointer_frontmatter: str | None = "source: acme-skills\norganization: ASF\n" + "skill_path: skills/acme-thing\nevals_path: tools/skill-evals/evals/acme-thing", + pointer_dir: str = "acme-thing", +) -> Path: + """Build a minimal repo with an organization, an org skill-sources.md + descriptor, and (optionally) a skills//source.md pointer.""" + (tmp_path / "organizations" / org).mkdir(parents=True) + (tmp_path / "organizations" / org / "skill-sources.md").write_text( + f"# {org} — curated skill sources\n\n## Curated sources\n\n{descriptor_fence}", + encoding="utf-8", + ) + if pointer_frontmatter is not None: + pdir = tmp_path / "skills" / pointer_dir + pdir.mkdir(parents=True) + (pdir / "source.md").write_text( + f"---\n{pointer_frontmatter}\n---\n\n# {pointer_dir} — redirect\n", + encoding="utf-8", + ) + return tmp_path + + +class TestSourceDescriptorParsing: + def test_commented_examples_declare_nothing(self) -> None: + assert parse_source_descriptors(_COMMENTED_DESCRIPTOR_FENCE) == [] + + def test_placeholder_id_is_ignored(self) -> None: + text = "```yaml\nid: \norganization: \n```\n" + assert parse_source_descriptors(text) == [] + + def test_real_descriptor_parsed(self) -> None: + descs = parse_source_descriptors(_REAL_DESCRIPTOR_FENCE) + assert len(descs) == 1 + d = descs[0] + assert d["id"] == "acme-skills" + assert d["organization"] == "ASF" + assert d["method"] == "git-tag" + assert "provides" in d["_keys"] + + def test_collect_known_source_ids(self, tmp_path: Path) -> None: + _make_source_repo(tmp_path, pointer_frontmatter=None) + assert collect_known_source_ids(tmp_path) == {"acme-skills"} + + +class TestSkillSourcePointer: + def test_is_pointer_true_for_source_md_only(self, tmp_path: Path) -> None: + _make_source_repo(tmp_path) + pdir = tmp_path / "skills" / "acme-thing" + assert is_skill_source_pointer(pdir) + assert [p.name for p in collect_skill_source_pointers(tmp_path)] == ["acme-thing"] + + def test_is_pointer_false_when_skill_md_present(self, tmp_path: Path) -> None: + _make_source_repo(tmp_path) + pdir = tmp_path / "skills" / "acme-thing" + (pdir / "SKILL.md").write_text("x", encoding="utf-8") + assert not is_skill_source_pointer(pdir) + + def test_valid_pointer_passes(self, tmp_path: Path) -> None: + _make_source_repo(tmp_path) + assert list(validate_skill_source_pointers(tmp_path)) == [] + + def test_unknown_source_hard_fails(self, tmp_path: Path) -> None: + _make_source_repo( + tmp_path, + pointer_frontmatter="source: ghost-source\norganization: ASF\n" + "skill_path: skills/acme-thing\nevals_path: tools/skill-evals/evals/acme-thing", + ) + vs = list(validate_skill_source_pointers(tmp_path)) + assert any(v.category == SKILL_SOURCE_CATEGORY and "ghost-source" in v.message for v in vs) + + def test_unknown_org_hard_fails(self, tmp_path: Path) -> None: + _make_source_repo( + tmp_path, + pointer_frontmatter="source: acme-skills\norganization: Nope\n" + "skill_path: skills/acme-thing\nevals_path: tools/skill-evals/evals/acme-thing", + ) + vs = list(validate_skill_source_pointers(tmp_path)) + assert any(v.category == ORGANIZATION_CATEGORY and "Nope" in v.message for v in vs) + + def test_missing_required_key(self, tmp_path: Path) -> None: + _make_source_repo( + tmp_path, + pointer_frontmatter="source: acme-skills\norganization: ASF", + ) + vs = list(validate_skill_source_pointers(tmp_path)) + msgs = " ".join(v.message for v in vs) + assert "skill_path" in msgs and "evals_path" in msgs + + def test_pointer_dir_draws_no_eval_coverage_advisory(self, tmp_path: Path) -> None: + _make_source_repo(tmp_path) + vs = list(validate_eval_coverage(tmp_path)) + assert not any("acme-thing" in v.message for v in vs) + + +class TestSkillSourceDescriptorValidation: + def test_valid_descriptor_passes(self, tmp_path: Path) -> None: + _make_source_repo(tmp_path, pointer_frontmatter=None) + assert list(validate_skill_source_descriptors(tmp_path)) == [] + + def test_unknown_method_hard_fails(self, tmp_path: Path) -> None: + bad = _REAL_DESCRIPTOR_FENCE.replace("method: git-tag", "method: rsync") + _make_source_repo(tmp_path, descriptor_fence=bad, pointer_frontmatter=None) + vs = list(validate_skill_source_descriptors(tmp_path)) + assert any(v.category == SKILL_SOURCE_CATEGORY and "rsync" in v.message for v in vs) + + def test_unknown_org_hard_fails(self, tmp_path: Path) -> None: + bad = _REAL_DESCRIPTOR_FENCE.replace("organization: ASF", "organization: Nope") + _make_source_repo(tmp_path, descriptor_fence=bad, pointer_frontmatter=None) + vs = list(validate_skill_source_descriptors(tmp_path)) + assert any(v.category == ORGANIZATION_CATEGORY and "Nope" in v.message for v in vs) + + def test_missing_required_key(self, tmp_path: Path) -> None: + bad = _REAL_DESCRIPTOR_FENCE.replace(" url: https://github.com/acme/skills\n", "") + _make_source_repo(tmp_path, descriptor_fence=bad, pointer_frontmatter=None) + vs = list(validate_skill_source_descriptors(tmp_path)) + assert any(v.category == SKILL_SOURCE_CATEGORY and "url" in v.message for v in vs)