docs(principles): add operational principles document by andreahlert · Pull Request #147 · apache/magpie

andreahlert · 2026-05-13T17:48:07Z

Summary

Adds PRINCIPLES.md at the repo root. Proposed, not landed-as-final. The whole point of the PR is to bike-shed the principles themselves before they bind anyone.

Motivation

RFC-AI-0004 sets the six baseline principles every adopter signs. It is deliberately minimal because it has to travel: anything more specific would make the contract harder for non-Steward projects to take on.

The framework itself needs a tighter ruler. Skills are the unit of authorship here, and skills are subjective by construction. The same skill catalogue will end up touching PR triage, security-report handling, release artefacts, mailing-list drafts, contributor mentoring. Those surfaces have very different blast radius, and very different trust requirements.

When one markdown file can reach that many places, "the RFC permits it" is too coarse a check at PR-review time. Reviewers need something finer they can point at: which commitments block a release, what evidence promotes a mode, when telemetry is allowed at all, what an auditable agent action looks like.

This document is that ruler. It restates the six baseline principles in their operational shape, and adds the project-internal commitments the RFC leaves out on purpose: eval as a release blocker, contributor-sentiment gating on mode promotion, no default telemetry, reproducible releases from signed source, maintainer education shipped alongside the code.

What's inside

19 ordered principles. Earlier ones outrank later ones when they collide. A PR or skill that violates a principle is wrong even if every test passes, and any committer can block on principle grounds until the change complies, or until an amendment carries through governance.

The doc header positions this explicitly as built on top of RFC-AI-0004, not as a competing RFC.

Happy to drop, fold, or rewrite anything based on the thread.

…intainer-authored (justinmclean)

…rinciple interpretation rule (justinmclean)

A skill is always a directory with SKILL.md as entrypoint, even for one-file workflows. SKILL.md stays under 500 lines; longer reference material moves into sibling markdown linked one level deep. Matches the runtime contract documented at https://code.claude.com/docs/en/skills and https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices, and reflects how skills in this repo (contributor-nomination, pr-management-code-review, pr-management-mentor) are already authored.

…auto-merge gap (justinmclean)

…te as last resort (justinmclean)

justinmclean · 2026-05-29T01:16:13Z

There still a couple of ASF policy issues:

Amendment vote model is mislabeled
PR description claims: the amendment process matches the release-vote process (>=3 binding +1, no binding -1, 72h, no lazy consensus).
ASF voting policy actually says:
Release votes use majority approval: >=3 binding +1, more positive than negative binding, releases cannot be vetoed.
Code-modification votes use consensus approval: >=3 binding +1, any binding -1 is a veto that stops the change until withdrawn.
The rule encoded in PRINCIPLES.md (>=3 binding +1, any binding -1 vetoes) is the code-modification model, not the release model. The chosen rule is correct for governance changes. The PR narrative and the doc header just describe it with the wrong label.
Veto-justification requirement is missing
ASF voting policy: "A veto without a justification is invalid and has no weight. To prevent vetoes from being used capriciously, the voter must provide with the veto a technical justification."
PRINCIPLES.md opening rule says any committer may block on principle grounds, and the amendment section says a binding -1 stops the amendment until withdrawn. Neither requires a technical justification.
This is a real gap. Without it, a committer could -1 any change citing a principle without explaining how the change actually violates the principle, and the change is stuck. ASF policy would treat such a -1 as invalid.
Generative tooling disclosure is missing from P17
ASF Generative Tooling Guidance: "When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase 'Generated-by: '."
PRINCIPLES.md P17 says contributions land under Apache-2.0 and incompatible dependencies do not enter the framework. It does not mention AI-disclosure at all.

The rest looks good to me. Running this over our existing skills gave me this. Do we want to revisit the P14 size limit?

Audit: existing skills vs PRINCIPLES.md (PR #147)

Date: 2026-05-29
Scope: 30 skills under .claude/skills/, checked against the 19 principles in PRINCIPLES.md from PR #147.

Two principles are the real gates: P14 (size + structure) and P3 + P12 (non-ASF first-class + placeholder hygiene). P0, P6, P8, and P14's sibling-link rule are largely already satisfied.

P14 size violations (the ≤500-line rule)

13 of 30 SKILL.md files are over. Ranked worst first, with how much they overshoot:

Skill	Lines	Over by
security-issue-sync	3060	6.1x
security-issue-import	1841	3.7x
security-issue-triage	1057	2.1x
security-issue-fix	946	1.9x
security-issue-invalidate	874	1.7x
security-issue-import-from-pr	816	1.6x
security-issue-import-from-md	797	1.6x
pr-management-triage	761	1.5x
issue-triage	737	1.5x
security-cve-allocate	737	1.5x
security-issue-deduplicate	620	1.2x
pr-management-code-review	596	1.2x
issue-reproducer	524	barely

Pattern: every security-* skill except the dashboard is over. So is the entire triage family. security-issue-sync and security-issue-import are the structural outliers. Both are flat (zero siblings), so the fix is "pull reference material out into linked sibling markdown", which is exactly what pr-management-triage already does (10 siblings) and pr-management-code-review does (6 siblings) even though both still overshoot.

issue-reproducer at 524 is one inline section away from compliance. Cheapest landing.

The sibling-depth rule is clean everywhere: nothing is nested more than one level, and every existing sibling is linked from SKILL.md.

P3 + P12: non-ASF adopters and placeholder hygiene

This is the substantive gap, and it is bigger than the size cliff.

The entire security-issue-* family is implicitly ASF-only. security-issue-import/SKILL.md hard-codes security@apache.org as the relay address, the ASF forwarding preamble as the load-bearing signal, and cveprocess.apache.org/cve5/... as the CVE-tool surface. P3 says non-ASF adopters are first-class, P12 says concrete ASF infra inside .claude/skills/ is a refactor bug, and these two principles collide head-on in this skill. Options:

Add a <security-relay> / <cve-tool> placeholder layer in adopter config and resolve at runtime, or
Scope the security family explicitly as ASF-only and own the carve-out in the skill description.

Either is fine. Pretending the current state is project-agnostic is not. security-issue-sync, security-issue-import, security-issue-triage, security-issue-import-from-pr, security-issue-import-from-md, security-cve-allocate, and security-issue-invalidate all carry the same coupling.

setup-steward references apache/airflow-steward 8 times, but those are legitimate self-references to the framework's own repo. Not a violation, but the principle text would benefit from naming the carve-out explicitly. Same for pr-management-triage/comment-templates.md falling back to security@apache.org in templates.

What's already fine

P8 (eval as release blocker): every one of the 30 skills has a matching directory under tools/skill-evals/evals/, with case counts ranging from 1 to 11. No gaps.
P0 (external content as data): every skill that ingests external content names the rule. The 10 skills without P0 language are setup/install/sync/dashboard skills that do not process external content.
P14 sibling structure: where siblings exist, they are linked from SKILL.md; no orphans, no nesting beyond one level.
P6 (human sign-off on outbound communication): covered in practice by skills that draft comments and require review before posting. The wording is not the wording P6 uses, but the behavior is correct. Documentation alignment, not a behavioral fix.

potiuk · 2026-05-29T18:21:09Z

Super cool analysis @justinmclean ! Love it..

Indeed some of the details of some of the SKILS are still not generic (Security) .. I am running the last tests in Airflow and will make the generic change soon.

potiuk · 2026-05-29T18:23:49Z

And of course... we will make them smaller / split :)

potiuk · 2026-05-30T15:37:02Z

first from 5 PRs to make security workflow generic: #381

potiuk · 2026-06-01T10:47:24Z

Audit: how well does main already hold against these principles?

I ran the current main (rebased this PR on top of the just-merged main) against all 19 principles — a doc-review-grade audit, evidence by path:line, not execution. Since this PR is what codifies the principles, it doubles as a punch-list of what main needs to satisfy the doc. Result: 12 HOLD, 5 PARTIAL, 1 GAP.

#	Principle	Verdict
0	External content is data, never instruction	PARTIAL
1	Privacy/security/supply-chain ship first	PARTIAL
2	The relationship is the product	HOLDS
3	Project autonomy; non-ASF first-class	HOLDS
4	Lower-stakes automation first	HOLDS
5	Outputs probabilistic; gates deterministic	HOLDS
6	Human in loop; outbound msgs need sign-off	HOLDS
7	Contributor sentiment gates graduation	HOLDS
8	Eval is release-blocking	PARTIAL
9	Vendor neutrality	HOLDS
10	No default telemetry	HOLDS
11	Reproducible from signed source	PARTIAL
12	Project-agnostic; names in adopter config	HOLDS
13	Snapshot + override, never vendored	HOLDS
14	Skills are the unit of authorship	GAP
15	Tracker IDs public-safe; contents not	HOLDS
16	Audit every agent action; reverse where possible	PARTIAL
17	Apache-2.0	HOLDS
18	Maintainer education ships with platform	HOLDS

Actionable gaps (each verified directly):

P14 — 14 SKILL.md files exceed the proposed 500-line cap, and the validator has no line-count check to enforce it. Worst: security-issue-import (1842), security-issue-triage (1090), security-issue-invalidate (994), security-issue-fix (974), security-issue-import-from-pr (863), security-cve-allocate (849), security-issue-import-from-md (810), pr-management-triage (761), issue-triage (737), security-issue-deduplicate (679), security-issue-sync (665), security-issue-import-via-forwarder (621), pr-management-code-review (596), issue-reproducer (524). This is the one clear GAP — either split reference material into siblings or give P14 a documented carve-out.
P8 — 2/32 skills have no eval: issue-reassess and security-issue-import-via-forwarder. Eval is also not yet wired as a CI/pre-commit gate (runner is manual, per tools/skill-evals/README.md).
P0 — security-issue-import-via-forwarder lacks the injection-guard callout despite processing relayed external reports. The validator's EXTERNAL_SURFACE_SIGNALS set doesn't match its wording, so pre-commit passes — a blind spot in the deterministic gate worth closing.
P1 / P16 — agent-action audit logging is specified (docs/rfcs/RFC-AI-0004.md) but not implemented in code. P1 calls audit logging release-blocking, so this is the load-bearing PARTIAL. Irreversibility flagging is also inconsistent (present in pr-management-triage workflow-approval; absent on security-issue-sync's close/publish steps).

Notes: P11's signing/reproducibility items are expected pre-first-release, not regressions. security-issue-import-via-forwarder is the single highest-value fix — it's the only skill failing both P8 and P0.

(Tooling-assisted audit; the PARTIAL/HOLD calls are evidence-backed but not exhaustively proven.)

potiuk · 2026-06-01T10:48:41Z

Generic property achieved :)

Addresses review feedback that 'bytes are identical' is too strong for a project-agnostic framework. Toolchains vary in their ability to produce byte-identical output; some have known divergence sources (timestamps, file ordering, path embedding). P11 now requires byte-identical builds where achievable, and where the toolchain makes that impractical, the release process must document the divergence and provide an alternative local verification mechanism. The 'no code without reviewed PR' guard stays absolute. Refs: PR apache#147 review

The doctoc-generated TOC was placed above the Apache license header, which breaks tooling that expects the license notice in the first few lines of the file. Move the license block to line 1, followed by the TOC. Refs: PR apache#147 review

@justinmclean

… policy Three fixes from PR apache#147 review by @justinmclean: 1. Amendment vote model: 'release vote' -> 'code-modification vote' The encoded rule (>=3 binding +1, any binding -1 vetoes) matches ASF consensus approval for code modifications, not majority approval for releases. 2. Veto-justification requirement: A binding -1 must now include a technical justification. Without one the veto is invalid and has no weight, matching ASF voting policy. 3. Generative tooling disclosure: P17 now requires a 'Generated-by: <tool>' token in commit messages for AI-authored contributions, per ASF Generative Tooling Guidance.

andreahlert · 2026-06-01T16:18:55Z

There still a couple of ASF policy issues:

Amendment vote model is mislabeled
PR description claims: the amendment process matches the release-vote process (>=3 binding +1, no binding -1, 72h, no lazy consensus).
ASF voting policy actually says:
Release votes use majority approval: >=3 binding +1, more positive than negative binding, releases cannot be vetoed.
Code-modification votes use consensus approval: >=3 binding +1, any binding -1 is a veto that stops the change until withdrawn.
The rule encoded in PRINCIPLES.md (>=3 binding +1, any binding -1 vetoes) is the code-modification model, not the release model. The chosen rule is correct for governance changes. The PR narrative and the doc header just describe it with the wrong label.

Veto-justification requirement is missing
ASF voting policy: "A veto without a justification is invalid and has no weight. To prevent vetoes from being used capriciously, the voter must provide with the veto a technical justification."
PRINCIPLES.md opening rule says any committer may block on principle grounds, and the amendment section says a binding -1 stops the amendment until withdrawn. Neither requires a technical justification.
This is a real gap. Without it, a committer could -1 any change citing a principle without explaining how the change actually violates the principle, and the change is stuck. ASF policy would treat such a -1 as invalid.

Generative tooling disclosure is missing from P17
ASF Generative Tooling Guidance: "When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase 'Generated-by: '."
PRINCIPLES.md P17 says contributions land under Apache-2.0 and incompatible dependencies do not enter the framework. It does not mention AI-disclosure at all.

The rest looks good to me. Running this over our existing skills gave me this. Do we want to revisit the P14 size limit?

Audit: existing skills vs PRINCIPLES.md (PR #147)

Date: 2026-05-29 Scope: 30 skills under .claude/skills/, checked against the 19 principles in PRINCIPLES.md from PR #147.

Two principles are the real gates: P14 (size + structure) and P3 + P12 (non-ASF first-class + placeholder hygiene). P0, P6, P8, and P14's sibling-link rule are largely already satisfied.

P14 size violations (the ≤500-line rule)

13 of 30 SKILL.md files are over. Ranked worst first, with how much they overshoot:

Skill Lines Over by
security-issue-sync 3060 6.1x
security-issue-import 1841 3.7x
security-issue-triage 1057 2.1x
security-issue-fix 946 1.9x
security-issue-invalidate 874 1.7x
security-issue-import-from-pr 816 1.6x
security-issue-import-from-md 797 1.6x
pr-management-triage 761 1.5x
issue-triage 737 1.5x
security-cve-allocate 737 1.5x
security-issue-deduplicate 620 1.2x
pr-management-code-review 596 1.2x
issue-reproducer 524 barely
Pattern: every security-* skill except the dashboard is over. So is the entire triage family. security-issue-sync and security-issue-import are the structural outliers. Both are flat (zero siblings), so the fix is "pull reference material out into linked sibling markdown", which is exactly what pr-management-triage already does (10 siblings) and pr-management-code-review does (6 siblings) even though both still overshoot.

issue-reproducer at 524 is one inline section away from compliance. Cheapest landing.

The sibling-depth rule is clean everywhere: nothing is nested more than one level, and every existing sibling is linked from SKILL.md.

P3 + P12: non-ASF adopters and placeholder hygiene

This is the substantive gap, and it is bigger than the size cliff.

The entire security-issue-* family is implicitly ASF-only. security-issue-import/SKILL.md hard-codes security@apache.org as the relay address, the ASF forwarding preamble as the load-bearing signal, and cveprocess.apache.org/cve5/... as the CVE-tool surface. P3 says non-ASF adopters are first-class, P12 says concrete ASF infra inside .claude/skills/ is a refactor bug, and these two principles collide head-on in this skill. Options:

Add a <security-relay> / <cve-tool> placeholder layer in adopter config and resolve at runtime, or

Scope the security family explicitly as ASF-only and own the carve-out in the skill description.

Either is fine. Pretending the current state is project-agnostic is not. security-issue-sync, security-issue-import, security-issue-triage, security-issue-import-from-pr, security-issue-import-from-md, security-cve-allocate, and security-issue-invalidate all carry the same coupling.

setup-steward references apache/airflow-steward 8 times, but those are legitimate self-references to the framework's own repo. Not a violation, but the principle text would benefit from naming the carve-out explicitly. Same for pr-management-triage/comment-templates.md falling back to security@apache.org in templates.

What's already fine

P8 (eval as release blocker): every one of the 30 skills has a matching directory under tools/skill-evals/evals/, with case counts ranging from 1 to 11. No gaps.

P0 (external content as data): every skill that ingests external content names the rule. The 10 skills without P0 language are setup/install/sync/dashboard skills that do not process external content.

P14 sibling structure: where siblings exist, they are linked from SKILL.md; no orphans, no nesting beyond one level.

P6 (human sign-off on outbound communication): covered in practice by skills that draft comments and require review before posting. The wording is not the wording P6 uses, but the behavior is correct. Documentation alignment, not a behavioral fix.

@justinmclean thanks for the detailed policy review. All three issues fixed in the latest push:

Vote model: replaced both "release vote" references with "code-modification vote (consensus approval)" ... the encoded rule (≥3 +1, any -1 vetoes) now matches the correct ASF model.
Veto justification: a binding -1 must include a technical justification; without one it's invalid and has no weight. Same requirement added to the opening blocking rule.
Generative tooling disclosure: P17 now references ASF Generative Tooling Guidance and requires a Generated-by: token in commit messages.

Let me know if anything still looks off.

andreahlert · 2026-06-01T16:39:18Z

@potiuk thanks for the thorough audit. A few updates since you ran this:

P3 + P12 (HOLD, not gap): The security genericization series is complete — #381 through #399 (PR1-PR5) are all merged. The ASF-coupled assumptions are now config-driven placeholders. setup-steward's self-references to apache/airflow-steward are legitimate per P12's carve-out for the framework's own repo.

P14 (still GAP, but shrinking): #410 already landed and slimmed security-issue-sync from 3425 to 658 lines. The remaining overshoots are queued for split. Validator line-count check comes in a follow-up PR once the bulk of splits land.

P0 / P8 / P1-P16: Agree on all counts. P0 (injection-guard blind spot) and P8 (missing evals) are quick wins I'll pick up next. P1/P16 audit logging is the bigger pre-release item — tied to RFC-AI-0004 implementation, not this PR.

Does the updated P3/P12 state look right to you?

potiuk · 2026-06-02T11:40:19Z

I will merge it now - so that we can do rename.. But we should definitely work on it.