Skip to content

committer-onboarding — post-vote onboarding for committers and PMC members#229

Closed
justinmclean wants to merge 4 commits into
apache:mainfrom
justinmclean:contribitor-onboarding
Closed

committer-onboarding — post-vote onboarding for committers and PMC members#229
justinmclean wants to merge 4 commits into
apache:mainfrom
justinmclean:contribitor-onboarding

Conversation

@justinmclean

Copy link
Copy Markdown
Member

Adds a new committer-onboarding skill that walks a nominator through
every post-vote step after a committer or PMC election passes, for both
incubating podlings and graduated TLPs.

What's included

Skill (.claude/skills/committer-onboarding/)

Four-step workflow covering three scenarios (new-committer,
committer-to-pmc, direct-to-pmc):

  • Step 0 — validate the vote result: binding vote counts, 72-hour
    period check, veto handling (veto requires a justification; code
    quality alone is not sufficient — must relate to conduct or fitness)
  • Step 1 — ICLA check and communications: three ICLA states (on file /
    submitted-not-processed / not filed); draft congratulations email and
    secretary account-creation request; checks that the nominator is a PMC
    chair or ASF Member before drafting the request; detects when the ICLA
    already included project + desired ID (automatic secretary flow)
  • Step 2 — post-account karma checklist: Whimsy roster update (PPMC
    for podlings, PMC/committer for TLPs), issue tracker (Jira only; GitHub
    Issues projects skip this), mailing list self-service, welcome
    announcement draft
  • Step 3 — completion summary with pending_items list for anything
    not yet done

Detail files:

  • detail/email-templates.md — congratulations (3 ICLA variants),
    secretary request, welcome announcement
  • detail/karma-grant.md — step-by-step Whimsy, mailing list, and Jira
    instructions; clarifies that committee-info.txt is the authoritative
    record for PMC membership (not LDAP), and that Whimsy is a convenient
    derived interface that updates both

Eval suite (tools/skill-evals/evals/committer-onboarding/)

24 cases across all four steps.

Privacy wiring (tools/privacy-llm/wiring.md)

Registers committer-onboarding in the skills table. The skill reads
private-list vote content, so Step 0 includes a privacy pre-flight
gate-check (privacy-llm-check --reads-private-list) and a PII handling
table (candidate = not redacted; voters = collaborators, exempt by
default; third-party names in discussion = must be redacted).

@potiuk

potiuk commented May 19, 2026

Copy link
Copy Markdown
Member

Should we have it as "Apache Way" family of skills? The idea for MagPie is to make it - immediately - useful both: in and out of ASF, and unlike most of the other SKILLs this one is strictly ASF specific. Nothing wrong with it, but I think we should open up to "other ways" potentially :)

@justinmclean

Copy link
Copy Markdown
Member Author

Sure I can make this part of an ASF way family of skills. Several of the other skills have ASFism in them - in some cases, project outside teh ASF may do things that way or not.

@justinmclean justinmclean self-assigned this May 20, 2026
@potiuk

potiuk commented May 22, 2026

Copy link
Copy Markdown
Member

Sure I can make this part of an ASF way family of skills. Several of the other skills have ASFism in them - in some cases, project outside teh ASF may do things that way or not.

Yeah. I think we need to make some of those generalisations / review those skills to make sure there are no remaining "airflow-ism", "groov-ism" and "asf-isms" - or at the very least that they are there as optional things that only kick-in in ASF projects.

I think the great benefitt of Magpie being "immediately adoptable" outside of ASF is clear :) -> And I think that should be something we might even add in our PRINCIPLES and skill writing / updating agentic code to make sure that the skills (except those dedicated for a given organisation) are generic enough to be used by anyone.. This IMHO will make Magpie much stronger from day one if we have such rule - and make it "important", because theat might alleviate some questions from outside adopters - and we should be able to clearly say "Yeah - we have some ASF specific things for our PMCs that you can also adopt if you want - but the main idea is to get the project outside-usable, without having to adopt the whole Apache Way.

@potiuk potiuk force-pushed the contribitor-onboarding branch from 37522be to 6376ac7 Compare May 24, 2026 22:54
@justinmclean justinmclean force-pushed the contribitor-onboarding branch from 6376ac7 to 472a5cf Compare May 26, 2026 02:31
Address the skill-validator findings surfaced by the pytest run on
this branch, plus two markdownlint MD040 errors the lint hook
surfaced on the touched committer-onboarding SKILL.md:

- committer-onboarding: add the standard Pattern 4 injection-guard
  callout. The skill reads the <vote-thread> from the mailing-list
  archive, candidate-supplied identity fields (name, email, desired
  Apache ID), and ICLA / Whimsy roster data; the callout names those
  surfaces explicitly. Golden rule 3 already reinforced the same
  principle; this adds the validator-recognised block. (HARD
  violation; was failing the pytest gate.)
- committer-onboarding: tag two existing untagged fenced output
  blocks (Step 0 output, Step 3 completion summary) as `text` so
  markdownlint MD040 stops flagging them on touch.
- security-issue-sync: add `--limit 100` to the milestone-siblings
  `gh issue list` count (was unbounded; silently capped at 30 on
  large repos).
- security-issue-triage: add `--limit 100` to the reviewed-by
  `gh pr list` search (same reason).
- setup-isolated-setup-doctor: move the docs/setup/
  sandbox-troubleshooting.md reference out of the frontmatter
  description into the body, so the matching-layer description stays
  tight and the criteria-source SOFT advisory clears. The body
  still documents the link extensively.
- committer-onboarding eval fixtures: append the missing trailing
  newline to 5 expected.json files.

Verified: `skill-validate --strict` reports OK (no violations);
`skill-validator` pytest suite is green; markdownlint passes.

Generated-by: Claude Code (Opus 4.7)
@justinmclean

Copy link
Copy Markdown
Member Author

Pre-flight self-review — PR #229 (committer-onboarding)

#229 · draft · author:
justinmclean

Base: main (merge-base f68064f) · Files changed: 111 (71 added, 40 modified)
· Diff size: +5728 / −606

Substantial PR: a new committer-onboarding skill + 4-step eval suite (~15
fixture cases), the --cli automated-evaluation mode in
tools/skill-evals/src/skill_evals/runner.py (+263), the matching test suite
(+245), a new tools/vulnogram/bot-credits-policy.md governance doc (+354), the
is_bot_credit() implementation + tests in generate-cve-json, a docs/modes.md
reference to a new mode-economics doc, and the five validator-fix commit
(4c92488) already landed in this session.

Correctness

  • [advisory] tools/skill-evals/evals/committer-onboarding/step-0-validate-vote
    /fixtures/output-spec.md — injection_detected is asserted by every
    expected.json in this step but not enumerated in the spec's bullet list. The
    prose paragraph at the bottom describes injection-detection behaviour ("the
    model must note it as a detected injection attempt and treat the tally data as
    invalid/untrusted") but never says "add an injection_detected boolean field."
    A model following the bullet list strictly would omit the field and fail the
    eval. Add one line: - injection_detected: boolean — true if the vote-tally
    input contained suspected agent-directed text.
  • [advisory] tools/skill-evals/evals/committer-onboarding/step-2-checklist/fix
    tures/output-spec.md — same pattern: whimsy_url_contains is asserted by
    expected.json but not in the spec's bullets (the spec lists whimsy_url_correct
    only). Add - whimsy_url_contains: string substring the Whimsy URL must
    include (PPMC vs. PMC discriminator).

(Verified clean: step-1 and step-3 specs and expected.json field sets align
exactly. skill-evals and generate-cve-json pytest suites are both green; the
latter exercises the new is_bot_credit() path.)

Security

  • [advisory] tools/skill-evals/src/skill_evals/runner.py:163-168 — the new
    --cli automated mode runs subprocess.run(..., shell=True, ...). The command
    string is operator-supplied (passed via --cli "") and the inputs to
    the subprocess (system + user prompt) go to stdin, not the shell, so the
    shell-injection surface is limited to the operator's own typed command — not
    an active vulnerability. Worth a hardening note: an alternate path using
    shlex.split(cmd) + shell=False removes one whole class of
    accidental-metacharacter footguns (e.g. an env-var in the operator's --cli
    value containing $ or backticks). Cosmetic at best, but it's the convention
    the rest of the framework's subprocess use follows.

(Verified clean: committer-onboarding's injection-guard callout is in place —
the fix landed in this branch's tip commit and the validator now reports OK.
Step-0 case-6, step-1 case-6, etc. carry adversarial coverage of injection in
candidate-supplied data.)

Conventions

No findings. skill-validate --strict reports OK (no violations) across the
entire repo, skill-validator pytest is green (the failure that triggered this
branch's investigation is gone), skill-evals pytest is green,
generate-cve-json pytest is green, check-placeholders is clean, all
committer-onboarding expected.json files end with \n, and the new
committer-onboarding/SKILL.md carries the standard Pattern 4 injection-guard
callout.


Summary

Ready to push — no blocking findings. Two small spec-bullet additions (step-0
and step-2) tighten the eval contract; the subprocess shell=True is a
defensible operator-trust choice but flag-worthy for future hardening.

Blocking: 0 Advisory: 3

Self-review findings on PR apache#229:

- committer-onboarding step-0 output-spec.md: enumerate the
  `injection_detected` field in the bullet list. The expected.json
  in every step-0 case asserts it, but the spec's prose only
  described injection-detection behaviour without naming the output
  field — a model following the bullets strictly would have omitted
  the key.
- committer-onboarding step-2 output-spec.md: enumerate the
  `whimsy_url_contains` field (the PPMC-vs-PMC discriminator
  substring). Same pattern: asserted by expected.json, not in the
  spec's bullets.
- skill-evals runner.py --cli mode: switch run_cli from
  `subprocess.run(cli, shell=True)` to
  `subprocess.run(shlex.split(cli), shell=False)`. The operator's
  command string was already trusted (the docstring said so), but
  using an argv list rather than a shell string keeps the
  attacker-controlled prompt content (injection-case fixtures and
  their like) firmly on stdin, well away from any shell
  interpretation, and removes a class of accidental-metacharacter
  footgun in the operator's --cli value. Operators who genuinely
  need shell features wrap their command in `bash -c '<pipeline>'`.

One test follow-on (test_runner.py): the MANUAL-skips-CLI case
used `"exit 1"` (a shell builtin) to assert non-zero-rc handling;
under shell=False the builtin is not on PATH and would FileNotFoundError
instead of exiting 1. Swapped to `"false"` — a real binary that exits 1
the same way — with an inline comment explaining the constraint.

Verified: `skill-evals` pytest green; `skill-validate --strict` reports
OK (no violations); `skill-validator` pytest green.

Generated-by: Claude Code (Opus 4.7)
@justinmclean

justinmclean commented May 26, 2026

Copy link
Copy Markdown
Member Author

Fixed the advisories, note that one was in the test runner, as this was not reviewed.

Markdownlint MD040 fenced-code-language: tag four template/output
blocks in the detail files as `text` — they're plain-text content
(email bodies, a URL template), not code in any real language.

- detail/karma-grant.md:104     Whimsy committer-profile URL template
- detail/email-templates.md:16  committer congratulations email body
- detail/email-templates.md:108 secretary account-request email body
- detail/email-templates.md:148 dev-list welcome announcement body

Verified with a broad markdownlint-cli2 sweep across all 76 changed
.md files on this branch — 0 errors remaining.

Generated-by: Claude Code (Opus 4.7)
@potiuk

potiuk commented May 27, 2026

Copy link
Copy Markdown
Member

Hi @justinmclean — same situation as #228 here. Tried a maintainer-side rebase but this branch is also significantly behind main (334 files / ~8k-line diff) — looks like it was branched off an older main and never refreshed. A clean rebase from your end would help; the conflict resolution needs your judgment on what's still applicable on the current taxonomy.

No rush.

@justinmclean justinmclean deleted the contribitor-onboarding branch May 29, 2026 00:42
justinmclean added a commit to justinmclean/airflow-steward that referenced this pull request May 29, 2026
Self-review findings on PR apache#229:

- committer-onboarding step-0 output-spec.md: enumerate the
  `injection_detected` field in the bullet list. The expected.json
  in every step-0 case asserts it, but the spec's prose only
  described injection-detection behaviour without naming the output
  field — a model following the bullets strictly would have omitted
  the key.
- committer-onboarding step-2 output-spec.md: enumerate the
  `whimsy_url_contains` field (the PPMC-vs-PMC discriminator
  substring). Same pattern: asserted by expected.json, not in the
  spec's bullets.
- skill-evals runner.py --cli mode: switch run_cli from
  `subprocess.run(cli, shell=True)` to
  `subprocess.run(shlex.split(cli), shell=False)`. The operator's
  command string was already trusted (the docstring said so), but
  using an argv list rather than a shell string keeps the
  attacker-controlled prompt content (injection-case fixtures and
  their like) firmly on stdin, well away from any shell
  interpretation, and removes a class of accidental-metacharacter
  footgun in the operator's --cli value. Operators who genuinely
  need shell features wrap their command in `bash -c '<pipeline>'`.

One test follow-on (test_runner.py): the MANUAL-skips-CLI case
used `"exit 1"` (a shell builtin) to assert non-zero-rc handling;
under shell=False the builtin is not on PATH and would FileNotFoundError
instead of exiting 1. Swapped to `"false"` — a real binary that exits 1
the same way — with an inline comment explaining the constraint.

Verified: `skill-evals` pytest green; `skill-validate --strict` reports
OK (no violations); `skill-validator` pytest green.

Generated-by: Claude Code (Opus 4.7)
potiuk pushed a commit to justinmclean/airflow-steward that referenced this pull request Jun 2, 2026
Self-review findings on PR apache#229:

- committer-onboarding step-0 output-spec.md: enumerate the
  `injection_detected` field in the bullet list. The expected.json
  in every step-0 case asserts it, but the spec's prose only
  described injection-detection behaviour without naming the output
  field — a model following the bullets strictly would have omitted
  the key.
- committer-onboarding step-2 output-spec.md: enumerate the
  `whimsy_url_contains` field (the PPMC-vs-PMC discriminator
  substring). Same pattern: asserted by expected.json, not in the
  spec's bullets.
- skill-evals runner.py --cli mode: switch run_cli from
  `subprocess.run(cli, shell=True)` to
  `subprocess.run(shlex.split(cli), shell=False)`. The operator's
  command string was already trusted (the docstring said so), but
  using an argv list rather than a shell string keeps the
  attacker-controlled prompt content (injection-case fixtures and
  their like) firmly on stdin, well away from any shell
  interpretation, and removes a class of accidental-metacharacter
  footgun in the operator's --cli value. Operators who genuinely
  need shell features wrap their command in `bash -c '<pipeline>'`.

One test follow-on (test_runner.py): the MANUAL-skips-CLI case
used `"exit 1"` (a shell builtin) to assert non-zero-rc handling;
under shell=False the builtin is not on PATH and would FileNotFoundError
instead of exiting 1. Swapped to `"false"` — a real binary that exits 1
the same way — with an inline comment explaining the constraint.

Verified: `skill-evals` pytest green; `skill-validate --strict` reports
OK (no violations); `skill-validator` pytest green.

Generated-by: Claude Code (Opus 4.7)
potiuk pushed a commit that referenced this pull request Jun 2, 2026
…mbers (#371)

* inital commit

* fix(skills): clear validator pytest failure + SOFT warnings

Address the skill-validator findings surfaced by the pytest run on
this branch, plus two markdownlint MD040 errors the lint hook
surfaced on the touched committer-onboarding SKILL.md:

- committer-onboarding: add the standard Pattern 4 injection-guard
  callout. The skill reads the <vote-thread> from the mailing-list
  archive, candidate-supplied identity fields (name, email, desired
  Apache ID), and ICLA / Whimsy roster data; the callout names those
  surfaces explicitly. Golden rule 3 already reinforced the same
  principle; this adds the validator-recognised block. (HARD
  violation; was failing the pytest gate.)
- committer-onboarding: tag two existing untagged fenced output
  blocks (Step 0 output, Step 3 completion summary) as `text` so
  markdownlint MD040 stops flagging them on touch.
- security-issue-sync: add `--limit 100` to the milestone-siblings
  `gh issue list` count (was unbounded; silently capped at 30 on
  large repos).
- security-issue-triage: add `--limit 100` to the reviewed-by
  `gh pr list` search (same reason).
- setup-isolated-setup-doctor: move the docs/setup/
  sandbox-troubleshooting.md reference out of the frontmatter
  description into the body, so the matching-layer description stays
  tight and the criteria-source SOFT advisory clears. The body
  still documents the link extensively.
- committer-onboarding eval fixtures: append the missing trailing
  newline to 5 expected.json files.

Verified: `skill-validate --strict` reports OK (no violations);
`skill-validator` pytest suite is green; markdownlint passes.

Generated-by: Claude Code (Opus 4.7)

* fix(skill-evals): close 3 advisory findings from self-review

Self-review findings on PR #229:

- committer-onboarding step-0 output-spec.md: enumerate the
  `injection_detected` field in the bullet list. The expected.json
  in every step-0 case asserts it, but the spec's prose only
  described injection-detection behaviour without naming the output
  field — a model following the bullets strictly would have omitted
  the key.
- committer-onboarding step-2 output-spec.md: enumerate the
  `whimsy_url_contains` field (the PPMC-vs-PMC discriminator
  substring). Same pattern: asserted by expected.json, not in the
  spec's bullets.
- skill-evals runner.py --cli mode: switch run_cli from
  `subprocess.run(cli, shell=True)` to
  `subprocess.run(shlex.split(cli), shell=False)`. The operator's
  command string was already trusted (the docstring said so), but
  using an argv list rather than a shell string keeps the
  attacker-controlled prompt content (injection-case fixtures and
  their like) firmly on stdin, well away from any shell
  interpretation, and removes a class of accidental-metacharacter
  footgun in the operator's --cli value. Operators who genuinely
  need shell features wrap their command in `bash -c '<pipeline>'`.

One test follow-on (test_runner.py): the MANUAL-skips-CLI case
used `"exit 1"` (a shell builtin) to assert non-zero-rc handling;
under shell=False the builtin is not on PATH and would FileNotFoundError
instead of exiting 1. Swapped to `"false"` — a real binary that exits 1
the same way — with an inline comment explaining the constraint.

Verified: `skill-evals` pytest green; `skill-validate --strict` reports
OK (no violations); `skill-validator` pytest green.

Generated-by: Claude Code (Opus 4.7)

* fix(committer-onboarding): tag untagged fenced output blocks as text

Markdownlint MD040 fenced-code-language: tag four template/output
blocks in the detail files as `text` — they're plain-text content
(email bodies, a URL template), not code in any real language.

- detail/karma-grant.md:104     Whimsy committer-profile URL template
- detail/email-templates.md:16  committer congratulations email body
- detail/email-templates.md:108 secretary account-request email body
- detail/email-templates.md:148 dev-list welcome announcement body

Verified with a broad markdownlint-cli2 sweep across all 76 changed
.md files on this branch — 0 errors remaining.

Generated-by: Claude Code (Opus 4.7)

* improve tests

* test(skill-evals): wrap _grader_count_cli in bash -c

run_cli switched to shell=False with shlex.split in 0a13c84, but the
test helper kept the shell env-var-prefix form which shlex.split
tokenises as a literal argv[0] binary name. Wrap the inner command in
bash -c so the env-var assignment is honoured.

Fixes 3 CI failures:
- test_batch_grade_single_pair_one_call
- test_batch_grade_many_pairs_one_call
- test_compare_with_grader_multiple_prose_mismatches_one_call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request family:tools tools/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants