From d68a08f6bb8bfb8b4408feebc166ab56b0953bfb Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Sat, 30 May 2026 21:51:07 +0200 Subject: [PATCH] feat(security-cve-allocate): extend title-strip cascade with two patterns from session manual cleanups MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per direct observations from the airflow-s 2026-05-29/30 bulk sync — two recurring title-noise patterns were cleaned manually that the existing cascade did not catch: 1. Trailing prior-CVE-relationship parentheticals — the cross-CVE relationship is structurally captured by the Gate #3 cross-CVE clause in the public summary; embedding the relationship in the title is noise to downstream advisory consumers. Catches every shape observed in this session: - `(CVE-YYYY-NNNNN)` - `(possible CVE-YYYY-NNNNN variant)` — from #345 - `(incomplete fix for CVE-YYYY-NNNNN)` — from #351 - `(fix-bypass of CVE-YYYY-NNNNN)` — from #352 - and any other `(... CVE-YYYY-NNNNN ...)` shape 2. Trailing reporter-name attribution parentheticals — reporter attribution lives in the credits field, never in the public title. Pattern matches `( follow-up)` where `` matches name-like tokens (word chars, dots, hyphens, single inline spaces) to avoid over-stripping substantive technical content. Catches: - `(Evan Ricafort follow-up)` — from #346 Substantive technical parentheticals stay intact — e.g. the operator- name list `(GCSToSFTPOperator + GCSTimeSpanFileTransformOperator)` on the GCS path-traversal tracker is NOT stripped (it lacks a CVE ID and doesn't end in `follow-up`). The matching Step 1d signal row in security-issue-sync now enumerates the two new patterns so the proposal-time detector and the pre-push Gate #4 stay in lock-step with the cascade. Validated against 9 cases: 4 session-derived fixes (all pass), 3 synthetic CVE-relationship variants (all pass), 1 substantive technical parenthetical (preserved correctly), 1 " follow-up" edge case (stripped as designed — narrow scope acceptable since "follow-up" titles in airflow-s are exclusively reporter-attribution). Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/skills/security-cve-allocate/SKILL.md | 16 ++++++++++++++++ .claude/skills/security-issue-sync/SKILL.md | 2 +- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/.claude/skills/security-cve-allocate/SKILL.md b/.claude/skills/security-cve-allocate/SKILL.md index d4889e78..d52b0295 100644 --- a/.claude/skills/security-cve-allocate/SKILL.md +++ b/.claude/skills/security-cve-allocate/SKILL.md @@ -341,6 +341,22 @@ patterns_trailing = [ # brackets. Extend the alternation per project. r"[ \t]*(?:\[(?:ZDRES|HUNTR|GHSL)-[\w-]+\]|\((?:ZDRES|HUNTR|GHSL)-[\w-]+\))\.?[ \t]*$", r"[ \t]*\([^)]*split from #\d+[^)]*\)\.?[ \t]*$", + # Trailing parentheticals that mention a prior CVE ID. The cross- + # CVE relationship belongs in the public summary (Gate #3 + # cross-CVE clause), never in the title — the title ships as + # `containers.cna.title` and prior-CVE references read as noise + # to downstream advisory consumers. Catches every shape observed + # in airflow-s manual cleanups: `(CVE-...)`, + # `(possible CVE-... variant)`, `(incomplete fix for CVE-...)`, + # `(fix-bypass of CVE-...)`, `(CVE-... )`, etc. + r"[ \t]*\([^)]*\bCVE-\d{4}-\d{4,7}\b[^)]*\)\.?[ \t]*$", + # Trailing `( follow-up)` parenthetical. Reporter + # attribution belongs in the credits field, never in the public + # title. The `` part matches name-like tokens (word chars, + # dots, hyphens, single inline spaces) to avoid over-stripping + # substantive technical parentheticals that happen to contain + # the word `follow-up`. + r"[ \t]*\([\w.][\w. -]*[ \t]+follow-up\)\.?[ \t]*$", ] # Leading passes twice — strip order reveals nested tags. diff --git a/.claude/skills/security-issue-sync/SKILL.md b/.claude/skills/security-issue-sync/SKILL.md index 9ae896bf..bf0b173f 100644 --- a/.claude/skills/security-issue-sync/SKILL.md +++ b/.claude/skills/security-issue-sync/SKILL.md @@ -848,7 +848,7 @@ update, label change, or next-step recommendation in Step 2: | The tracker is an **incomplete-fix follow-up to another CVE** — detected by any of: the rollup or body mentions *"incomplete fix for `CVE-YYYY-NNNNN`"* / *"follow-up to `CVE-YYYY-NNNNN`"* / *"sibling tracker"*; the title contains a *"(incomplete fix for `CVE-YYYY-NNNNN`)"* parenthetical; the `affected[]` array names a different `packageName` than the referenced prior CVE; OR the tracker was opened as a split from a closed-`announced` tracker whose CVE is already PUBLISHED — **AND** the *Short public summary for publish* body field does not yet contain BOTH (a) the prior `CVE-YYYY-NNNNN` ID verbatim AND (b) a *"users who already applied [the prior CVE's fix] should also apply this one"* clause naming the current product/package. | Propose expanding the summary to add the cross-CVE + cross-product upgrade ask per the *"Incomplete-fix-to-another-CVE"* paragraph in Step 2b. Concretely, the summary must (1) name the prior CVE explicitly, (2) state that the prior fix did not cover the current product/surface, (3) tell users who already applied the prior fix to **also** apply this one (the two are complementary, not duplicates). **Why:** when a CVE is published as a follow-up to a prior CVE, the reader's default reading is *"I already applied the earlier fix; this is a duplicate."* Without explicit cross-CVE + cross-product framing in the summary, downstream consumers miss that two upgrades are needed. Apply this fix on every sync run that surfaces an incomplete-fix tracker whose summary lacks the cross-CVE clause, even when no other body update is being proposed. After updating, regenerate the CVE JSON attachment so the published `descriptions[].value` reflects the cross-CVE relationship. | | The *"CWE"* body field is populated with a bare `CWE-NNN` token (no description text) — e.g. `CWE-22` or `CWE-502` alone, without the canonical short description that follows in the format `CWE-NNN: ` | Propose expanding the field to `CWE-NNN: <Canonical Title>` per the MITRE CWE catalog (e.g. `CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')`, `CWE-502: Deserialization of Untrusted Data`, `CWE-601: URL Redirection to Untrusted Site ('Open Redirect')`). **Prefer a CWE from the project's *advised CWEs* list** when one is declared in [`<project-config>/scope-labels.md`](../../../<project-config>/scope-labels.md) or the project's CVE-tool config — the advised list captures the CWE classes the project's security team has standardised on, and using one from the list makes cross-CVE comparison cleaner. **Why:** the published CVE record's `problemTypes[].descriptions[].description` field carries the human-readable text the advisory mailing list and `cve.org` render; a bare `CWE-NNN` is technically a valid identifier but useless to readers who don't keep the MITRE numbering in their head. The longer form costs nothing to add and significantly improves the published advisory's clarity. Apply on every sync run that surfaces a bare CWE token. After updating, regenerate the CVE JSON attachment so `problemTypes[]` carries the expanded form. | | The tracker's *Security mailing list thread* body field references a **private scanner product** (declared in [`<project-config>/scanner-products.md`](../../../<project-config>/scanner-products.md) — e.g. internal SAST, partner-shared scan, unpublished bug-bounty pipeline) **AND** the *Reporter credited as* body field names a person rather than `anonymous` / a public handle, **AND** there is no signal the finder consented to public credit (no inbound `security@` message from them under their own name, no public HackerOne / huntr.dev report URL on the thread, no explicit *"please credit me as `<name>`"* line). | Propose rewriting the *Reporter credited as* field to `anonymous` and stripping the scanner product name from the *Short public summary for publish* body field text (e.g. *"Mythos scan flagged that…"* → drop the scanner-product clause; *"Imported from internal SAST"* → drop). **Audit-trail surfaces stay untouched**: the *Security mailing list thread* body field, the status-rollup comment, and the Gmail thread keep the original scanner-product + person-name references for security-team auditing. Only the CVE-record-bound surfaces (summary, credit) get the anonymise scrub. **Why:** scanner-tool product names are commercial / IP-sensitive (naming the scanner publicly amounts to free advertising and some contracts restrict attribution), and individual finders sourced from private channels haven't consented to public credit (their org pointed a scanner at the codebase and shared findings privately — there was no `security@` thread asking to be named). The *combination* of (named individual + named proprietary scanner) also pattern-leaks the discovery channel and how that org runs security on its codebase. Apply on every sync run that surfaces the signal; the *Reporter credited as* field is read verbatim by the CVE-JSON generator into `credits[]` and the *Short public summary for publish* is read into `descriptions[].value`. The same scrub re-runs as the sixth pre-push hygiene gate in [Step 5b 1b](#decision-flow) — it catches the case where the body update was missed at proposal time and the JSON would otherwise ship with the scanner name still in it. **Exempt cases**: when the finder already self-credited under a public name (HackerOne report URL, huntr.dev public report URL, the reporter's own `security@` message naming themselves), keep the named credit — the scrubber must not anonymise a credit that was already public elsewhere. | -| The **issue title** contains adopter-specific or internal noise that would otherwise ship to the public CVE record — leading or trailing project-name tokens (e.g. ``Apache Airflow:`` / ``in Apache Airflow`` / ``(Apache Airflow X.Y)``), internal split markers (``(split from #NNN)`` / ``(split for scope clarity from #NNN)``), report-form classifiers (``[ Security Report ]`` / ``[Security Issue]``), external-tracker IDs in parentheses or brackets (``[GHSA-xxxx-xxxx-xxxx]``, ``(ZDRES-NNNNN)``, ``(HUNTR-NNNNN)``, ``(GHSL-NNNN-NNN)``), or version-noise suffixes (``(v3.2.1)``, ``(3.x)``). The check applies on every sync pass, including trackers whose title was previously clean but has drifted since allocation. | Propose updating the title via `gh issue edit <N> --title "<cleaned>"`. **Reuse the [`security-cve-allocate` Step 2 title-strip cascade](../security-cve-allocate/SKILL.md#step-2--compute-the-cve-ready-title)** — both the leading-pattern set (project-name tokens, `Security (Report\|Issue\|Vulnerability\|Bug)` prefixes) and the trailing-pattern set (`in (Apache )?Airflow`, GHSA/ZDRES/HUNTR/GHSL trailing IDs, `(split from #N)` parentheticals). **Why this matters even though `security-cve-allocate` already strips at allocation time:** the GitHub issue title is read **verbatim** by the CVE-JSON generator into `containers.cna.title`, which ships in the published advisory and on `cve.org`. Titles drift between allocation and the final regen (manual edits to add context, sibling-tracker splits, GHSA-relay imports that append the GHSA ID), so the sync skill must re-run the same cleanup on every pass. **Preserve stripped context as audit trail** in the issue body (a `### Related references` section near the bottom) or in the rollup — internal pointers like *"split from [#NNN](...)"* are useful for the security team and must not be silently lost; just move them off the user-facing title. After updating the title, regenerate the CVE JSON attachment so the published `title` field reflects the cleaned value. If the strip would collapse the title to fewer than 3 words, **flag the ambiguity** in the proposal (matching `security-cve-allocate`'s safety) and let the user override — over-stripping is worse than leaving one redundant word. | +| The **issue title** contains adopter-specific or internal noise that would otherwise ship to the public CVE record — leading or trailing project-name tokens (e.g. ``Apache Airflow:`` / ``in Apache Airflow`` / ``(Apache Airflow X.Y)``), internal split markers (``(split from #NNN)`` / ``(split for scope clarity from #NNN)``), report-form classifiers (``[ Security Report ]`` / ``[Security Issue]``), external-tracker IDs in parentheses or brackets (``[GHSA-xxxx-xxxx-xxxx]``, ``(ZDRES-NNNNN)``, ``(HUNTR-NNNNN)``, ``(GHSL-NNNN-NNN)``), version-noise suffixes (``(v3.2.1)``, ``(3.x)``), trailing prior-CVE-relationship parentheticals (``(CVE-YYYY-NNNNN)`` / ``(possible CVE-YYYY-NNNNN variant)`` / ``(incomplete fix for CVE-YYYY-NNNNN)`` / ``(fix-bypass of CVE-YYYY-NNNNN)`` — the cross-CVE relationship belongs in the public summary's Gate #3 clause, never in the title), or trailing reporter-name attribution parentheticals (``(Evan Ricafort follow-up)`` / ``(<name> follow-up)`` — reporter attribution belongs in the credits field, never in the public title). The check applies on every sync pass, including trackers whose title was previously clean but has drifted since allocation. | Propose updating the title via `gh issue edit <N> --title "<cleaned>"`. **Reuse the [`security-cve-allocate` Step 2 title-strip cascade](../security-cve-allocate/SKILL.md#step-2--compute-the-cve-ready-title)** — both the leading-pattern set (project-name tokens, `Security (Report\|Issue\|Vulnerability\|Bug)` prefixes) and the trailing-pattern set (`in (Apache )?Airflow`, GHSA/ZDRES/HUNTR/GHSL trailing IDs, `(split from #N)` parentheticals). **Why this matters even though `security-cve-allocate` already strips at allocation time:** the GitHub issue title is read **verbatim** by the CVE-JSON generator into `containers.cna.title`, which ships in the published advisory and on `cve.org`. Titles drift between allocation and the final regen (manual edits to add context, sibling-tracker splits, GHSA-relay imports that append the GHSA ID), so the sync skill must re-run the same cleanup on every pass. **Preserve stripped context as audit trail** in the issue body (a `### Related references` section near the bottom) or in the rollup — internal pointers like *"split from [#NNN](...)"* are useful for the security team and must not be silently lost; just move them off the user-facing title. After updating the title, regenerate the CVE JSON attachment so the published `title` field reflects the cleaned value. If the strip would collapse the title to fewer than 3 words, **flag the ambiguity** in the proposal (matching `security-cve-allocate`'s safety) and let the user override — over-stripping is worse than leaving one redundant word. | | A release carrying the fix has shipped. Detection is **scope-dependent** — different scope labels on a project can ride different release trains, each with its own *"is it released?"* signal (which artifact registry to consult, what to query, how to map a tracker's milestone to that registry, partial-release edge cases). The per-scope detection recipe lives in [`<project-config>/scope-labels.md` — *Detecting that a fix release has shipped*](../../../<project-config>/scope-labels.md#detecting-that-a-fix-release-has-shipped). The "or an explicit *fix shipped in X.Y.Z* comment" fallback applies across all scopes regardless of the project-specific signal. | **Two-stage gate: every mandatory CVE field must be populated AND the CVE record state in Vulnogram must be `REVIEW`.** Before proposing either the label swap or the assignee swap, run both checks. **Stage 1 — body fields**: check that all six body fields are populated (not empty, not `_No response_`): *CWE*, *Affected versions*, *Severity*, *Reporter credited as*, *Short public summary for publish*, *PR with the fix*. If any is missing, **do NOT propose the hand-off**. Instead, propose posting (or PATCH-updating) the *Remediation-developer fill-fields comment* per the dedicated bullet in Step 2b — issue stays assigned to the remediation developer; no label swap, no assignee swap, no RM hand-off. **Stage 2 — CVE state**: with Stage 1 clear, Step 5b's `vulnogram-api-record-update` push includes `body.CNA_private.state = "REVIEW"` (the new auto-promote behaviour — see Step 5b for details). After the push, verify the record state is now `REVIEW` (via `vulnogram-api-record-fetch` / the equivalent state probe). If the state is still `DRAFT` after the push (push failed, CNA-schema validation rejected the JSON, transient error), **re-fire the fill-fields comment** with the refreshed blocker description, and **do NOT propose the hand-off / label swap / assignee swap on this pass**. The RM never receives a hand-off while the record is in `DRAFT`. **When both stages are clear (state == REVIEW)**: propose swapping `pr merged` → `fix released` (Step 12). This is the release manager's cue to own Steps 13–15 (advisory send → URL capture → Vulnogram PUBLIC → close). **Also propose swapping the assignee from the remediation developer to the release manager** (looked up via the three-source cascade in Step 2c — [`<project-config>/release-trains.md`](../../../<project-config>/release-trains.md) "Release managers for releases currently relevant to the security tracker" → Release Plan wiki → `[RESULT][VOTE]` thread on `dev@`), so the issue list reflects ownership hand-off. See the *Assignee hand-off at the `fix released` transition* paragraph under **Assignees** in Step 2b for the full rule. | | GHSA state transition (opened, accepted, published, rejected) in a GHSA-forwarded email | If the GHSA is closed as "not accepted" but the security team accepted the report on `security@`, flag the divergence in the status comment so it is not lost. | | Team member saying *"let's also backport to v3-2-test"* / *"please mark X for backport"* | Note the requested backport label on the public PR as an item for Step 9 of the `security-issue-fix` workflow. |