sanitize: drop Airflow remnants; synthesize fictional adopter for tests#7
Merged
Conversation
…r tests
Comprehensive sanitization pass across the framework's documentation
and the generate-cve-json Python package, plus a config-driven
refactor of three previously-hardcoded Airflow values in cve_json.py.
## Documentation (Tier 1 + Tier 2)
Top-level docs:
- `README.md`: drop the broken `Current projects` table (no
per-project subdirs in this repo); reframe the legacy
`apache/airflow-steward` repo identity; replace hardcoded
`airflow / providers / chart` scope-label list with a pointer to
`<project-config>/scope-labels.md`; replace the `airflow-s` grep
blocklist token with `<tracker>`; replace `airflow-s#XYZ` with
`<tracker>#XYZ`; rewrite `Adopting the framework` section.
- `CONTRIBUTING.md`: drop the per-project tree section that
documented a no-longer-existing `projects/airflow/` subtree; rewrite
the file index for the framework-only layout; retarget the base
branch from `airflow-s` to `main`; replace per-project guidance
with adopter-side guidance.
- `new-members-onboarding.md`: replace 4 hardcoded
`projects/airflow/` and `airflow-s/projects/2` URLs with
`<project-config>` pointers and generic phrasing.
- `projects/_template/{README,canned-responses,release-trains,
title-normalization}.md`: drop cross-references to a
no-longer-existing `../airflow/` template peer; inline the
shapes/example wording previously hosted there.
Tool docs:
- `tools/gmail/asf-relay.md`, `tools/ponymail/tool.md`: replace
Airflow-specific PMC LDAP / mailing-list values with
`<project-config>` pointers and generic phrasing.
- `tools/gmail/{operations,search-queries,draft-backends}.md`,
`tools/github/{project-board,tool}.md`: replace hardcoded
`airflow-s` / `airflow-s/projects/2` / `airflow-s@noreply.github.com`
references with `<tracker>` placeholders / project-config
pointers / a new `<tracker-noreply>` placeholder.
Skill files (`.claude/skills/*/SKILL.md`):
- Replace `airflow-s` standalone references with `<tracker>` /
`<tracker> issue` / `<tracker> tracking issue` /
`/blob/<tracker-default-branch>/` everywhere it isn't an
`(example: airflow-s/airflow-s …)` framing line. Total: 7 skill
files touched, ~30 substitutions.
- Reframe Airflow-as-the-assumed-adopter language: "Apache Airflow
PMC" → "the project's PMC", "the Airflow security team" → "the
project security team", "the Airflow Security Model" → "the
project's security model", "the Airflow Release Plan wiki" → "the
project's release plan", etc.
Skill files (the deeper Airflow-specific procedural content —
release-train naming, scope-label sets, milestone formats, the
title-normalization regex example — is intentionally left for a
follow-up that moves it into per-project config under
`<project-config>/`. Refactoring it changes skill behaviour and
needs its own PR.)
`AGENTS.md`: 3 prose tweaks to drop "Airflow's", "for all Airflow
CVEs", "the Airflow process", "the Airflow release train" in
contexts that treated Airflow as the assumed project.
`tools/vulnogram/generate-cve-json/SKILL.md`: rewrite the preamble
to flag concrete `apache-foo-providers-*` strings as illustrative
examples rather than Airflow defaults; rewrite multi-product /
provider-display-map / `< NEXT VERSION` sections to refer to
adopter-supplied config rather than a built-in `AIRFLOW_PROVIDER_
DISPLAY_MAP` (which the code no longer has — was stale doc).
## Code (Tier 3 — config-driven refactor)
`tools/vulnogram/generate-cve-json/src/generate_cve_json/cve_json.py`
had three hardcoded Airflow values that bypassed the otherwise-
config-driven design:
- `wrap_cve_record` hardcoded `CNA_private.projecturl =
"https://airflow.apache.org/"`, `owner = "airflow"`,
`userslist = "users@airflow.apache.org"` into every emitted CVE
record.
- `build_references` hardcoded the `"airflow-s" not in url` filter,
so adopters' tracker URLs would leak into published CVE records
while Airflow's wouldn't.
- `resolve_title` hardcoded the `r"^\s*apache\s+airflow\s*[:\-...]?
\s*"` strip regex, so the title-prefix scrub only worked for
Airflow-titled issues.
All three are now config-driven:
- New `[cna_private]` config section with `project_url`, `owner`,
`users_list` fields. Loaded into `CNA_PRIVATE_PROJECT_URL`,
`CNA_PRIVATE_OWNER`, `CNA_PRIVATE_USERS_LIST` constants.
- New `TRACKER_FILTER_TOKEN` constant derived from `meta.tracker_repo`
(uses the org segment) — `build_references` filters with that.
- New `TITLE_STRIP_RE` constant compiled at config-load time from the
configured `top_level_product` — `resolve_title` uses it.
Also removed an orphaned `SKILL_SOURCE_URL = "https://github.com/
airflow-s/airflow-s/..."` assignment at module level that was
overriding the config-loaded value (bug fix).
## Test fixture (Q1b — fictional adopter)
`tests/fixtures/cve-json-config.toml` replaced its Airflow-shaped
values with a fictional "Apache Example" project: `apache-example`
top-level package, `apache-example-providers-<name>` provider
layout, `apache-example-s/apache-example-s` tracker repo,
`example.apache.org` project URL, etc. The provider display map is
trimmed to the providers the tests actually exercise (5 entries:
cncf-kubernetes, elasticsearch, opensearch, smtp, snowflake);
unknown providers fall back to the title-cased dash-split path.
`tests/test_generate_cve_json.py` (135 assertions touched via sed):
all `Apache Airflow` → `Apache Example`, `apache-airflow` →
`apache-example`, `apache-airflow-providers-*` →
`apache-example-providers-*`, `airflow-s/airflow-s` →
`apache-example-s/apache-example-s`, plus 1 manual update for the
CNA_private envelope assertion (now `example` / `users@example.
apache.org`).
## Other
- `pyproject.toml`: rename root package `apache-airflow-steward` →
`apache-steward` (the future canonical name; `name` field is
internal-only, no PyPI publication). Rationale captured in a
comment.
- `uv.lock` regenerated.
## Test plan
- 139 tests pass across both Python projects (134 generate-cve-json,
58 oauth-draft).
- `prek run --all-files` passes (all 16 hooks).
- `zizmor` clean.
Generated-by: Claude Code (Claude Opus 4.7)
…l names
Follow-up to the previous sanitization commit. The earlier pass had
left two adopter-shape leaks in the generate-cve-json package:
1. The codebase used **"provider"** throughout (PROVIDER_DISPLAY_MAP,
provider_product_template, provider_dir, etc.) — a term that comes
straight from Apache Airflow's `apache-airflow-providers-*`
sub-package layout. Renamed to **"project"** so the framework
doesn't bake in any one adopter's terminology.
2. The test fixture's `project_display_map` (was: provider map)
contained the names of real ASF providers / external tools
(`elasticsearch`, `snowflake`, `cncf-kubernetes`, `opensearch`,
`smtp`). Replaced with clearly-fictional placeholders that
exercise the same code paths.
## Renames in cve_json.py
- Constants: `PROVIDER_DISPLAY_MAP` → `PROJECT_DISPLAY_MAP`,
`PROVIDER_PRODUCT_TEMPLATE` → `PROJECT_PRODUCT_TEMPLATE`,
`PROVIDER_PREFIX` → `PROJECT_PREFIX`.
- Local vars: `provider_dir` → `project_dir`, etc.
- Package-name infix: `-providers-` → `-project-` (so the regex now
matches `apache-example-project-foo` instead of
`apache-example-providers-foo`).
- Comments / docstrings: every "provider" / "Provider" / "providers"
→ "project" / "Project" / "projects".
- Re-export from `__init__.py` updated to match.
- **Untouched:** the CVE 5.x schema field name `providerMetadata`
(line 909, line 53 docstring). That's the upstream schema, not
our terminology.
## Renames in the test fixture
- TOML key: `[packages.provider_display_map]` → `[packages.project_display_map]`.
- TOML key: `provider_product_template` → `project_product_template`.
- Template value: `"Apache Example Providers {display}"` →
`"Apache Example Project {display}"`.
- Regex named group: `(?P<provider>...)` → `(?P<project>...)`.
- Display map entries — replaced 5 real-software names with
fictional placeholders that cover the same test cases:
| Old (real software) | New (fictional) | Test purpose |
|---|---|---|
| `cncf-kubernetes` → `CNCF Kubernetes` | `acme-xyz` → `Acme XYZ` | multi-word + acronym |
| `elasticsearch` → `Elasticsearch` | `foo` → `Foo` | single-word identity |
| `opensearch` → `OpenSearch` | `kerfluffle` → `Kerfluffle` | single-word mixed-case |
| `smtp` → `SMTP` | `xyz` → `XYZ` | single-word acronym |
| `snowflake` → `Snowflake` | `bar` → `Bar` | single-word identity |
| (none) | `pop-corn` → `Pop Corn` | multi-word title-case |
`brand-new` and `madeup-widget` (used by tests for the unmapped-
fallback path) were already fictional and stay as-is.
## Test updates
`test_generate_cve_json.py`: 60+ substitutions to follow the
renames. All 139 tests pass against the new fixture.
## SKILL.md
Same terminology pass: "provider directory" → "project directory",
"providers trackers" → "project trackers", etc. The illustrative
package examples (`apache-foo-providers-elasticsearch`) are now
`apache-foo-project-alpha` / `apache-foo-project-beta` — fictional
sub-project names instead of real software references.
## Test plan
- ✅ 139 tests pass (134 generate-cve-json, 58 oauth-draft).
No grepping for "provider" picks up anything except the upstream
`providerMetadata` CVE schema field.
- ✅ `prek run --all-files` passes (16 hooks).
- ✅ `zizmor` clean.
Generated-by: Claude Code (Claude Opus 4.7)
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive sanitization pass — the framework now talks about itself in fully project-agnostic terms; the Airflow-specific scaffolding (test fixture, package-naming infix, hardcoded CVE-record fields) is gone.
33 files changed, +519 −562 across two commits. All 139 tests pass; all prek hooks green.
Two commits
Commit 1 —
docs+code: sanitize Airflow remnants; synthesize fictional adopter for testsprojects/airflow/references; replace with<tracker>/<project-config>placeholders.airflow-sstandalone references →<tracker>; "Apache Airflow PMC" → "the project's PMC"; "the Airflow Security Model" → "the project's security model"; etc.cve_json.pyconfig-driven refactor): three hardcoded Airflow values were bypassing the otherwise-config-driven design —wrap_cve_record'sCNA_privateenvelope (projecturl/owner/userslist),build_references' URL filter, andresolve_title's strip regex. All three now config-driven via a new[cna_private]section +TRACKER_FILTER_TOKEN+TITLE_STRIP_REconstants. Also removed an orphanedSKILL_SOURCE_URLassignment that was overriding the config-loaded value.tests/fixtures/cve-json-config.tomlrewritten as a fictional "Apache Example" project; 135 sed-driven assertion updates intest_generate_cve_json.py.pyproject.tomlpackage nameapache-airflow-steward→apache-steward.Commit 2 —
sanitize: rename 'provider' → 'project'; switch test data to fictional namesThe first commit left two adopter-shape leaks the user caught:
Terminology: the codebase used "provider" throughout — a term inherited from Airflow's
apache-airflow-providers-*layout. Renamed to "project" everywhere (constants, locals, regex named group, comments, config keys, package-name infix-providers-→-project-). Untouched: the upstream CVE 5.x schema fieldproviderMetadata.Fictional test data: the test fixture's display map still carried real ASF / external software names (
elasticsearch,snowflake,cncf-kubernetes,opensearch,smtp). Replaced with placeholders that cover the same test paths:cncf-kubernetes→CNCF Kubernetesacme-xyz→Acme XYZelasticsearch→Elasticsearchfoo→Fooopensearch→OpenSearchkerfluffle→Kerflufflesmtp→SMTPxyz→XYZsnowflake→Snowflakebar→Barpop-corn→Pop Cornbrand-new/madeup-widget(unmapped-fallback test names) were already fictional, kept.Out of scope (follow-up PRs)
Airflow-specific procedural content still in
sync-security-issue/SKILL.mdandallocate-cve/SKILL.md(release-train naming, scope-label sets, milestone formats, title-normalization regex example) — needs extraction to per-project config; separate PR.Test plan
prek run --all-filespasses (16 hooks).zizmorclean.grep -i provideragainst the source/tests yields only the upstreamproviderMetadataschema reference.🤖 Generated with Claude Code