Skip to content

sanitize: drop Airflow remnants; synthesize fictional adopter for tests#7

Merged
potiuk merged 2 commits into
mainfrom
sanitize-airflow-references
Apr 29, 2026
Merged

sanitize: drop Airflow remnants; synthesize fictional adopter for tests#7
potiuk merged 2 commits into
mainfrom
sanitize-airflow-references

Conversation

@potiuk

@potiuk potiuk commented Apr 29, 2026

Copy link
Copy Markdown
Member

Summary

Comprehensive sanitization pass — the framework now talks about itself in fully project-agnostic terms; the Airflow-specific scaffolding (test fixture, package-naming infix, hardcoded CVE-record fields) is gone.

33 files changed, +519 −562 across two commits. All 139 tests pass; all prek hooks green.

Two commits

Commit 1 — docs+code: sanitize Airflow remnants; synthesize fictional adopter for tests

  • Tier 1 (top-level docs): README, CONTRIBUTING, new-members-onboarding, projects/_template/*.md — drop hardcoded Airflow scope-label lists, board URLs, projects/airflow/ references; replace with <tracker> / <project-config> placeholders.
  • Tier 2 (skill files + tool docs): ~30 substitutions across 7 SKILL.md files and tools/{gmail,github,ponymail}/*.md. airflow-s standalone references → <tracker>; "Apache Airflow PMC" → "the project's PMC"; "the Airflow Security Model" → "the project's security model"; etc.
  • Tier 3 (cve_json.py config-driven refactor): three hardcoded Airflow values were bypassing the otherwise-config-driven design — wrap_cve_record's CNA_private envelope (projecturl/owner/userslist), build_references' URL filter, and resolve_title's strip regex. All three now config-driven via a new [cna_private] section + TRACKER_FILTER_TOKEN + TITLE_STRIP_RE constants. Also removed an orphaned SKILL_SOURCE_URL assignment that was overriding the config-loaded value.
  • Q1(b): tests/fixtures/cve-json-config.toml rewritten as a fictional "Apache Example" project; 135 sed-driven assertion updates in test_generate_cve_json.py.
  • Q2: pyproject.toml package name apache-airflow-stewardapache-steward.

Commit 2 — sanitize: rename 'provider' → 'project'; switch test data to fictional names

The first commit left two adopter-shape leaks the user caught:

  • Terminology: the codebase used "provider" throughout — a term inherited from Airflow's apache-airflow-providers-* layout. Renamed to "project" everywhere (constants, locals, regex named group, comments, config keys, package-name infix -providers--project-). Untouched: the upstream CVE 5.x schema field providerMetadata.

  • Fictional test data: the test fixture's display map still carried real ASF / external software names (elasticsearch, snowflake, cncf-kubernetes, opensearch, smtp). Replaced with placeholders that cover the same test paths:

    Old (real) New (fictional) Tests
    cncf-kubernetesCNCF Kubernetes acme-xyzAcme XYZ multi-word + acronym
    elasticsearchElasticsearch fooFoo single-word identity
    opensearchOpenSearch kerfluffleKerfluffle single-word mixed-case
    smtpSMTP xyzXYZ single-word acronym
    snowflakeSnowflake barBar single-word identity
    (none) pop-cornPop Corn multi-word title-case

    brand-new / madeup-widget (unmapped-fallback test names) were already fictional, kept.

Out of scope (follow-up PRs)

Airflow-specific procedural content still in sync-security-issue/SKILL.md and allocate-cve/SKILL.md (release-train naming, scope-label sets, milestone formats, title-normalization regex example) — needs extraction to per-project config; separate PR.

Test plan

  • ✅ 139 tests pass (134 generate-cve-json, 58 oauth-draft).
  • prek run --all-files passes (16 hooks).
  • zizmor clean.
  • grep -i provider against the source/tests yields only the upstream providerMetadata schema reference.

🤖 Generated with Claude Code

potiuk added 2 commits April 29, 2026 03:50
…r tests

Comprehensive sanitization pass across the framework's documentation
and the generate-cve-json Python package, plus a config-driven
refactor of three previously-hardcoded Airflow values in cve_json.py.

## Documentation (Tier 1 + Tier 2)

Top-level docs:

- `README.md`: drop the broken `Current projects` table (no
  per-project subdirs in this repo); reframe the legacy
  `apache/airflow-steward` repo identity; replace hardcoded
  `airflow / providers / chart` scope-label list with a pointer to
  `<project-config>/scope-labels.md`; replace the `airflow-s` grep
  blocklist token with `<tracker>`; replace `airflow-s#XYZ` with
  `<tracker>#XYZ`; rewrite `Adopting the framework` section.
- `CONTRIBUTING.md`: drop the per-project tree section that
  documented a no-longer-existing `projects/airflow/` subtree; rewrite
  the file index for the framework-only layout; retarget the base
  branch from `airflow-s` to `main`; replace per-project guidance
  with adopter-side guidance.
- `new-members-onboarding.md`: replace 4 hardcoded
  `projects/airflow/` and `airflow-s/projects/2` URLs with
  `<project-config>` pointers and generic phrasing.
- `projects/_template/{README,canned-responses,release-trains,
  title-normalization}.md`: drop cross-references to a
  no-longer-existing `../airflow/` template peer; inline the
  shapes/example wording previously hosted there.

Tool docs:

- `tools/gmail/asf-relay.md`, `tools/ponymail/tool.md`: replace
  Airflow-specific PMC LDAP / mailing-list values with
  `<project-config>` pointers and generic phrasing.
- `tools/gmail/{operations,search-queries,draft-backends}.md`,
  `tools/github/{project-board,tool}.md`: replace hardcoded
  `airflow-s` / `airflow-s/projects/2` / `airflow-s@noreply.github.com`
  references with `<tracker>` placeholders / project-config
  pointers / a new `<tracker-noreply>` placeholder.

Skill files (`.claude/skills/*/SKILL.md`):

- Replace `airflow-s` standalone references with `<tracker>` /
  `<tracker> issue` / `<tracker> tracking issue` /
  `/blob/<tracker-default-branch>/` everywhere it isn't an
  `(example: airflow-s/airflow-s …)` framing line. Total: 7 skill
  files touched, ~30 substitutions.
- Reframe Airflow-as-the-assumed-adopter language: "Apache Airflow
  PMC" → "the project's PMC", "the Airflow security team" → "the
  project security team", "the Airflow Security Model" → "the
  project's security model", "the Airflow Release Plan wiki" → "the
  project's release plan", etc.

Skill files (the deeper Airflow-specific procedural content —
release-train naming, scope-label sets, milestone formats, the
title-normalization regex example — is intentionally left for a
follow-up that moves it into per-project config under
`<project-config>/`. Refactoring it changes skill behaviour and
needs its own PR.)

`AGENTS.md`: 3 prose tweaks to drop "Airflow's", "for all Airflow
CVEs", "the Airflow process", "the Airflow release train" in
contexts that treated Airflow as the assumed project.

`tools/vulnogram/generate-cve-json/SKILL.md`: rewrite the preamble
to flag concrete `apache-foo-providers-*` strings as illustrative
examples rather than Airflow defaults; rewrite multi-product /
provider-display-map / `< NEXT VERSION` sections to refer to
adopter-supplied config rather than a built-in `AIRFLOW_PROVIDER_
DISPLAY_MAP` (which the code no longer has — was stale doc).

## Code (Tier 3 — config-driven refactor)

`tools/vulnogram/generate-cve-json/src/generate_cve_json/cve_json.py`
had three hardcoded Airflow values that bypassed the otherwise-
config-driven design:

- `wrap_cve_record` hardcoded `CNA_private.projecturl =
  "https://airflow.apache.org/"`, `owner = "airflow"`,
  `userslist = "users@airflow.apache.org"` into every emitted CVE
  record.
- `build_references` hardcoded the `"airflow-s" not in url` filter,
  so adopters' tracker URLs would leak into published CVE records
  while Airflow's wouldn't.
- `resolve_title` hardcoded the `r"^\s*apache\s+airflow\s*[:\-...]?
  \s*"` strip regex, so the title-prefix scrub only worked for
  Airflow-titled issues.

All three are now config-driven:

- New `[cna_private]` config section with `project_url`, `owner`,
  `users_list` fields. Loaded into `CNA_PRIVATE_PROJECT_URL`,
  `CNA_PRIVATE_OWNER`, `CNA_PRIVATE_USERS_LIST` constants.
- New `TRACKER_FILTER_TOKEN` constant derived from `meta.tracker_repo`
  (uses the org segment) — `build_references` filters with that.
- New `TITLE_STRIP_RE` constant compiled at config-load time from the
  configured `top_level_product` — `resolve_title` uses it.

Also removed an orphaned `SKILL_SOURCE_URL = "https://github.com/
airflow-s/airflow-s/..."` assignment at module level that was
overriding the config-loaded value (bug fix).

## Test fixture (Q1b — fictional adopter)

`tests/fixtures/cve-json-config.toml` replaced its Airflow-shaped
values with a fictional "Apache Example" project: `apache-example`
top-level package, `apache-example-providers-<name>` provider
layout, `apache-example-s/apache-example-s` tracker repo,
`example.apache.org` project URL, etc. The provider display map is
trimmed to the providers the tests actually exercise (5 entries:
cncf-kubernetes, elasticsearch, opensearch, smtp, snowflake);
unknown providers fall back to the title-cased dash-split path.

`tests/test_generate_cve_json.py` (135 assertions touched via sed):
all `Apache Airflow` → `Apache Example`, `apache-airflow` →
`apache-example`, `apache-airflow-providers-*` →
`apache-example-providers-*`, `airflow-s/airflow-s` →
`apache-example-s/apache-example-s`, plus 1 manual update for the
CNA_private envelope assertion (now `example` / `users@example.
apache.org`).

## Other

- `pyproject.toml`: rename root package `apache-airflow-steward` →
  `apache-steward` (the future canonical name; `name` field is
  internal-only, no PyPI publication). Rationale captured in a
  comment.
- `uv.lock` regenerated.

## Test plan

- 139 tests pass across both Python projects (134 generate-cve-json,
  58 oauth-draft).
- `prek run --all-files` passes (all 16 hooks).
- `zizmor` clean.

Generated-by: Claude Code (Claude Opus 4.7)
…l names

Follow-up to the previous sanitization commit. The earlier pass had
left two adopter-shape leaks in the generate-cve-json package:

1. The codebase used **"provider"** throughout (PROVIDER_DISPLAY_MAP,
   provider_product_template, provider_dir, etc.) — a term that comes
   straight from Apache Airflow's `apache-airflow-providers-*`
   sub-package layout. Renamed to **"project"** so the framework
   doesn't bake in any one adopter's terminology.
2. The test fixture's `project_display_map` (was: provider map)
   contained the names of real ASF providers / external tools
   (`elasticsearch`, `snowflake`, `cncf-kubernetes`, `opensearch`,
   `smtp`). Replaced with clearly-fictional placeholders that
   exercise the same code paths.

## Renames in cve_json.py

- Constants: `PROVIDER_DISPLAY_MAP` → `PROJECT_DISPLAY_MAP`,
  `PROVIDER_PRODUCT_TEMPLATE` → `PROJECT_PRODUCT_TEMPLATE`,
  `PROVIDER_PREFIX` → `PROJECT_PREFIX`.
- Local vars: `provider_dir` → `project_dir`, etc.
- Package-name infix: `-providers-` → `-project-` (so the regex now
  matches `apache-example-project-foo` instead of
  `apache-example-providers-foo`).
- Comments / docstrings: every "provider" / "Provider" / "providers"
  → "project" / "Project" / "projects".
- Re-export from `__init__.py` updated to match.
- **Untouched:** the CVE 5.x schema field name `providerMetadata`
  (line 909, line 53 docstring). That's the upstream schema, not
  our terminology.

## Renames in the test fixture

- TOML key: `[packages.provider_display_map]` → `[packages.project_display_map]`.
- TOML key: `provider_product_template` → `project_product_template`.
- Template value: `"Apache Example Providers {display}"` →
  `"Apache Example Project {display}"`.
- Regex named group: `(?P<provider>...)` → `(?P<project>...)`.
- Display map entries — replaced 5 real-software names with
  fictional placeholders that cover the same test cases:

  | Old (real software) | New (fictional) | Test purpose |
  |---|---|---|
  | `cncf-kubernetes` → `CNCF Kubernetes` | `acme-xyz` → `Acme XYZ` | multi-word + acronym |
  | `elasticsearch` → `Elasticsearch` | `foo` → `Foo` | single-word identity |
  | `opensearch` → `OpenSearch` | `kerfluffle` → `Kerfluffle` | single-word mixed-case |
  | `smtp` → `SMTP` | `xyz` → `XYZ` | single-word acronym |
  | `snowflake` → `Snowflake` | `bar` → `Bar` | single-word identity |
  |   (none) | `pop-corn` → `Pop Corn` | multi-word title-case |

  `brand-new` and `madeup-widget` (used by tests for the unmapped-
  fallback path) were already fictional and stay as-is.

## Test updates

`test_generate_cve_json.py`: 60+ substitutions to follow the
renames. All 139 tests pass against the new fixture.

## SKILL.md

Same terminology pass: "provider directory" → "project directory",
"providers trackers" → "project trackers", etc. The illustrative
package examples (`apache-foo-providers-elasticsearch`) are now
`apache-foo-project-alpha` / `apache-foo-project-beta` — fictional
sub-project names instead of real software references.

## Test plan

- ✅ 139 tests pass (134 generate-cve-json, 58 oauth-draft).
  No grepping for "provider" picks up anything except the upstream
  `providerMetadata` CVE schema field.
- ✅ `prek run --all-files` passes (16 hooks).
- ✅ `zizmor` clean.

Generated-by: Claude Code (Claude Opus 4.7)
@potiuk potiuk merged commit b73e240 into main Apr 29, 2026
6 checks passed
@potiuk potiuk deleted the sanitize-airflow-references branch April 29, 2026 02:04
@andreahlert andreahlert added the mode:platform Substrate / infra — not a mode (sandbox, CI, validators) label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mode:platform Substrate / infra — not a mode (sandbox, CI, validators)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants