feat(export): SBOM export + declared-license recorder (U5 + U6)#1820
Conversation
Part of #1774. Implements work unit U6 of issue #1777: a declared-license recorder that mirrors npm -- trust the package manifest's declared license, never parse LICENSE text, judge nothing. - Add `declared_license` to LockedDependency (to_dict omits when None; from_dict restores; absence means unknown, no sentinel stored). - Backfill at resolve time from the resolved dep's `apm.yml` `license:` (APM packages) or `plugin.json` `license` (ingested plugins) via read_declared_license(). Wired through the three install source sites and attached in LockfileBuilder. - Offline SPDX syntax classifier (bundled id/exception sets, no network, no dependency added). Three states, never collapsed: valid id / expression -> passthrough; special token (UNLICENSED, SEE LICENSE IN) -> named assertion; undeclared -> NOASSERTION. - Authoring-path warn: `apm pack` / `apm publish` nudge when the author's own apm.yml declares no license. Silent on the consuming path -- never nag about transitive deps. Never blocks. declared != concluded. Invalid SPDX is recorded verbatim and never raises. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Part of #1774. Implements work unit U5 of issue #1777: an SBOM/provenance export that serializes the existing lockfile into CycloneDX or SPDX. This is an inventory export, not a security attestation. - `apm lock export --format cyclonedx|spdx` -- a format flag on the lockfile-export concept, NOT a new `apm sbom` verb. `lock` becomes a group with invoke_without_command so bare `apm lock` still resolves. - Reads the lockfile ONLY: never re-resolves, re-hashes, or touches the network or filesystem. Component identity is a purl derived from recorded fields (pkg:github/.. for git, pkg:oci/.. for registry, pkg:generic/.. for local); the forge is inferred from the recorded repo_url host when host_type is absent. - Deterministic output: components sorted by purl, a pinned timestamp (--timestamp > SOURCE_DATE_EPOCH > lockfile generated_at), stable key order -> byte-identical across runs (golden-file test). - Surfaces declared_license per the npm-faithful representation: valid id -> license.id; expression -> expression; special/unknown -> named; undeclared -> CycloneDX omits licenses[], SPDX writes NOASSERTION. - Scrubs any embedded credentials from recorded URLs before emit. - Docs: security-and-supply-chain (SBOM section + positioning), lockfile-spec (declared_license field + schema), apm-usage skill resources (commands, package-authoring). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a lockfile-derived SBOM/inventory export (apm lock export --format cyclonedx|spdx) and introduces declared_license provenance capture (manifest-declared license recorded into apm.lock.yaml and surfaced in SBOM output), alongside authoring-path warnings for missing license: in an author's own apm.yml.
Changes:
- Add
src/apm_cli/export/with SPDX classification data + deterministic CycloneDX/SPDX JSON serializers and purl/URL-scrub helpers. - Extend lockfile model + install pipeline to capture
declared_licenseat acquire time and persist it inapm.lock.yaml. - Convert
apm lockinto a Click group and addapm lock exportwith deterministic timestamping and stdout/file output support; update docs/schema accordingly.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_lockfile_declared_license.py | Unit coverage for LockedDependency.declared_license serialization/roundtrip behavior. |
| tests/unit/install/phases/test_lockfile_declared_license.py | Tests for attaching declared-license provenance into lockfile entries. |
| tests/unit/export/test_spdx.py | Tests for offline SPDX declared-license classification behavior. |
| tests/unit/export/test_sbom.py | Tests for CycloneDX/SPDX output shape, determinism, and credential scrubbing. |
| tests/unit/export/test_sbom_golden.py | Golden fixtures enforcing byte-identical deterministic SBOM output. |
| tests/unit/export/test_purl.py | Tests for purl identity derivation and URL credential scrubbing. |
| tests/unit/export/test_declared_license.py | Tests for reading declared license from apm.yml / plugin.json. |
| tests/unit/export/test_authoring.py | Tests for authoring-path warning behavior on missing license:. |
| tests/unit/export/golden/sbom.spdx.json | Golden SPDX JSON fixture. |
| tests/unit/export/golden/sbom.cyclonedx.json | Golden CycloneDX JSON fixture. |
| tests/unit/commands/test_lock_export_command.py | CLI tests for apm lock export behavior (format, output, timestamp, no-resolve). |
| src/apm_cli/install/sources.py | Acquire-time backfill of ctx.package_declared_licenses from installed manifests. |
| src/apm_cli/install/phases/lockfile.py | LockfileBuilder attaches declared licenses into LockFile entries. |
| src/apm_cli/install/context.py | Adds package_declared_licenses map to install context. |
| src/apm_cli/export/spdx.py | SPDX declared-license classifier (id vs expression vs named assertion). |
| src/apm_cli/export/spdx_data.py | Bundled SPDX license/exception ID data for offline classification. |
| src/apm_cli/export/sbom.py | Deterministic CycloneDX/SPDX serializers reading only lockfile fields. |
| src/apm_cli/export/purl.py | purl derivation + URL userinfo scrubbing for export output. |
| src/apm_cli/export/declared_license.py | Reads declared license from dependency install path manifests. |
| src/apm_cli/export/authoring.py | Authoring-path warning for missing license: in the author's apm.yml. |
| src/apm_cli/export/init.py | Package init/documentation for export subsystem. |
| src/apm_cli/deps/lockfile.py | Adds declared_license field to LockedDependency serialization and known-key set. |
| src/apm_cli/commands/publish.py | Hooks authoring warning into apm publish. |
| src/apm_cli/commands/pack.py | Hooks authoring warning into apm pack (suppressed under --json). |
| src/apm_cli/commands/lock.py | Converts lock to group; adds lock export subcommand and timestamp resolution. |
| packages/apm-guide/.apm/skills/apm-usage/package-authoring.md | Documents license: field semantics for SBOM/lockfile provenance. |
| packages/apm-guide/.apm/skills/apm-usage/commands.md | Documents new apm lock export command surface. |
| docs/src/content/docs/reference/lockfile-spec.md | Adds declared_license to lockfile reference spec. |
| docs/src/content/docs/enterprise/security-and-supply-chain.md | Documents SBOM export and declared-license semantics. |
| docs/public/specs/schemas/lockfile-v0.1.schema.json | Adds declared_license to the published lockfile schema. |
Copilot's findings
Comments suppressed due to low confidence (1)
tests/unit/export/test_purl.py:118
- Avoid substring assertions on URLs in tests; CI CodeQL flags incomplete URL substring sanitization patterns. Parse the scrubbed URL and assert on hostname/netloc instead.
def test_scrub_url_handles_oci_scheme():
scrubbed = scrub_url("oci://user:tok@registry.example.com/acme/oci-tools@sha256:abc")
assert "tok" not in scrubbed
assert "registry.example.com" in scrubbed
- Files reviewed: 30/30 changed files
- Comments generated: 7
| if not declared: | ||
| return None | ||
| result = classify_declared_license(declared) | ||
| if result.kind == KIND_ID: |
| if not declared: | ||
| return _NOASSERTION | ||
| return declared |
| def _is_valid_license_id(token: str) -> bool: | ||
| """Whether *token* is a recognized SPDX id (allowing a trailing ``+``).""" | ||
| bare = token[:-1] if token.endswith("+") else token | ||
| return bool(bare) and (bare in SPDX_LICENSE_IDS or _is_license_ref(token)) | ||
|
|
| def test_licenseref_is_expression_or_id(): | ||
| # LicenseRef-* is a valid SPDX simple expression element (no public id). | ||
| result = classify_declared_license("LicenseRef-MyLicense") | ||
| assert result.kind in (KIND_ID, KIND_EXPRESSION) | ||
| assert result.value == "LicenseRef-MyLicense" |
| | `constraint` | string | git-source semver only | The original semver range from `apm.yml` (`^1.2.0`, `~1.4`). Present when `ref:` was a range; used by drift detection so a manifest range vs. a locked tag (`v1.5.3`) is not a false positive, and by lockfile replay to pin the resolved tag deterministically across installs. | | ||
| | `resolved_tag` | string | git-source semver or SHA-pin updates | The concrete annotated git tag (`v1.5.3`, `widget--v1.5.3`) that satisfied `constraint` or justified the latest full-SHA revision-pin update. | | ||
| | `resolved_at` | string | git-source semver only | RFC 3339 timestamp of the resolution. Surfaces "how stale is this pin?" in `apm why`. | | ||
| | `declared_license` | string | no | The license the package *manifest declares* (`license:` in `apm.yml`, or `license` in a `plugin.json`), recorded verbatim at resolve time and syntax-validated offline against the bundled SPDX id set. An author **claim**, not a conclusion from `LICENSE` text -- APM never reads the license file. Omitted when undeclared (absence means unknown; no sentinel is stored). Surfaced by `apm lock export`. | |
| **Declared license, npm-faithful.** APM records the license the package | ||
| *manifest declares* (`license:` in `apm.yml`, or `license` in a | ||
| `plugin.json`) into the lockfile's `declared_license` field at resolve | ||
| time, syntax-validates it offline against the bundled SPDX id set, and | ||
| passes it through to the SBOM. APM never reads or interprets the text of a |
| The value is syntax-validated **offline** against the bundled SPDX id set. | ||
| An unrecognized string (or a special token like `UNLICENSED` or | ||
| `SEE LICENSE IN <file>`) is **never** rejected -- it is recorded verbatim | ||
| and emitted in the SBOM as a named license. Authoring never blocks on a | ||
| license value. |
…afe) Folds the apm-review-panel recommendations on PR #1820 (U5/U6) inside the PR's stated scope: - security (supply-chain + auth): scrub_url now strips the entire query string (access_token, SAS sig, ...) in addition to userinfo, so a credential-bearing recorded URL can never leak into SBOM output. - perf/architecture: format identifiers moved to a dependency-free export/formats.py so importing the lock command no longer eagerly loads the bundled SPDX id table; the serializer import stays deferred. - cli ux: lock export routes every diagnostic to stderr, keeping `apm lock export | jq` clean on success and error. - docs: cli/lock.md gains an Export (SBOM) section; manifest-schema 3.5 license expanded with the declared_license semantics; security page vendor-neutral framing + query-string scrub note; cross-links + MD012. - tests: query-string + SAS scrub regression, lazy-import probe, and a CLI-level pack warn / export-silent asymmetry trap (mutation-break verified). Deferred (scope-crossing, noted on PR): machine-verifiable SPDX-table provenance regeneration + CI check. Part of #1774 apm-spec-waiver: additive provenance/export only -- the declared-license recorder records a license only when the package manifest declares one (it never concludes a license from LICENSE file text) and SBOM export is a read-only serializer of already-recorded lockfile fields; this PR introduces no new normative apm-policy MUST, so no req-XXX spec anchor applies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Part of #1774. Security (CodeQL HIGH py/incomplete-url-substring-sanitization): - test_scrub_url_handles_oci_scheme asserted the host via a substring `in` check; rewrote it to compare urlsplit().hostname per the repo tests.instructions ("URL assertions must use urllib.parse, never substring"). Clears the 1 new high-severity alert on this PR. Supply-chain (identity hardening): - build_purl now percent-encodes every namespace/name segment so a crafted dependency name cannot inject purl-structural characters (`/`, `@`, `#`, `?`, whitespace) into the component identity. Clean forge slugs are unchanged (golden fixtures byte-stable). New traps in test_purl.py; mutation-break verified. Folded advisory nits (revision-2 panel): - python-architect: drop format constants from sbom.py __all__; the canonical dependency-free home is export/formats.py. - cli-logging: tighten two pre-fold export tests to assert result.stderr instead of mixed result.output. - test-coverage: add CLI-level publish warn-vs-silent regression trap (test_publish_cli_surface.py) symmetric to the pack path; mutation-break verified. - doc-writer: clarify CycloneDX-omit vs SPDX-NOASSERTION in manifest-schema 3.5 (the default format omits, only SPDX writes the literal NOASSERTION). - oss-growth: neutralize a vendor name in the apm-usage skill resource. Deferred (scope/process): README SBOM feature bullet (README edits need maintainer approval per repo rule); group-vs-subcommand --global flag (idiomatic Click, accepted as-is). apm-spec-waiver: additive provenance/export only -- the declared-license recorder records a license only when the package manifest declares one (it never concludes a license from LICENSE file text) and SBOM export is a read-only serializer of already-recorded lockfile fields; this PR introduces no new normative apm-policy MUST, so no req-XXX spec anchor applies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Supply Chain Security | 0 | 0 | 1 | Ship-ready; query-string scrub + NOASSERTION fail-closed verified; purl percent-encoding nit folded (identity-spoofing hard line closed). |
| Python Architect | 0 | 0 | 1 | Cleanly layered, simplest-correct at scope; sbom.py __all__ over-export nit folded to canonical formats.py. |
| Cli Logging | 0 | 0 | 1 | stderr routing fold holds; authoring WARN actionable and symbol-delegated; two mixed-output test nits folded to result.stderr. |
| Devx Ux | 0 | 0 | 1 | Command surface clean, pipe-safe, idiomatic; group vs subcommand --global nit deferred (idiomatic Click, persona accepts). |
| Oss Growth Hacker | 0 | 1 | 1 | SBOM lowers procurement gate, framing clean; README bullet deferred (maintainer approval); one skill vendor-name folded. |
| Doc Writer | 0 | 0 | 1 | lock/manifest/lockfile-spec/security pages accurate vs code; CycloneDX-omit-vs-SPDX-NOASSERTION nuance folded. |
| Test Coverage | 0 | 0 | 1 | 145 tests pass, all five critical surfaces trapped; missing publish-warn CLI test folded (mutation-break verified). |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 2 follow-ups
- [Oss Growth Hacker] Add SBOM/provenance export bullet to README feature list -- Hero-page visibility for the enterprise-procurement unlock; deferred only because README edits require maintainer sign-off per repo rule.
- [Devx Ux] Evaluate group
--globalflag vs subcommand pattern for export commands -- Minor UX consistency question; persona accepts current design as idiomatic Click matching established CLI precedent. Revisit only if user feedback surfaces confusion.
Recommendation
Merge at maintainer discretion. All blocking and recommended findings are resolved; CI is fully green; supply-chain and test-coverage personas independently confirm the security and correctness surfaces hold. The two deferred items are non-code post-merge tasks (README bullet, UX preference review) that carry zero regression risk.
Full per-persona findings
Supply Chain Security
- [nit] purl namespace/name segments were not percent-encoded in
src/apm_cli/export/purl.py
A crafted dependency name could otherwise perturb component identity. Folded this pass:build_purlnow percent-encodes each namespace/name segment (version/digest left intact to preserve ocisha256:and golden fixtures); mutation-break verified.
Suggested: encode each purl segment withquote(safe=""). - Verified folds: full query-string scrub in
scrub_url, NOASSERTION never upgraded,declared_licenseomitted (not sentineled) when undeclared, authoring-vs-consuming warn asymmetry structurally enforced.
Python Architect
- [nit]
sbom.pyre-exported format constants via__all__
Canonical home isexport/formats.py(dependency-free); the re-export risked a second import path. Folded:__all__trimmed to["export_sbom"].
Layering otherwise minimal-correct; the lazy-import fold holds.
Cli Logging
- [nit] two pre-fold tests asserted against mixed
result.output
Folded: tightened toresult.stderrso the pipe-safe stderr routing is actually pinned. Authoring WARN wording confirmed actionable ("add alicense:field to apm.yml") and ASCII-symbol-delegated.
Devx Ux
- [nit] group vs subcommand handling of the
--globalflag
Deferred: idiomatic Click matching established CLI precedent; persona accepts the current surface as-is. Praised authoring-vs-consuming asymmetry and pipe composability.
Oss Growth Hacker
- [recommended] surface SBOM export as a README feature bullet
Deferred: README edits require maintainer approval per repo rule. Tracked as the top post-merge follow-up. - [nit] one vendor name in an apm-usage skill resource
Folded: neutralized to vendor-free phrasing.
Doc Writer
- [nit] two summary pages flattened CycloneDX-omit vs SPDX-NOASSERTION
Folded:manifest-schema.mdnow states CycloneDX omits the license entry while SPDX writes the literal NOASSERTION.
Test Coverage
- [nit] the publish-command license warn had no CLI-level regression test
Folded: addedtests/unit/commands/test_publish_cli_surface.pysymmetric to the pack surface (warn-when-undeclared / silent-when-declared); mutation-break verified. Suite now 145 passing.
Auth Expert -- inactive
No AuthResolver / HostInfo / token-management surface is touched by this PR; the prior credential-scrub fold was verified as a courtesy.
Performance Expert -- inactive
Export is pure in-memory serialization of lockfile-recorded fields; the lazy-import fold eliminated the only import-time cost. No hot path affected.
This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.
Surfaces `apm lock export --format cyclonedx|spdx` on the hero page as
the top post-merge follow-up from the review panel's growth lens.
Framed as inventory ("what reached disk, straight from the lockfile"),
not a compliance attestation -- holds the install+integrity positioning
line. Maintainer-authorized README edit.
Part of #1774
apm-spec-waiver: docs-only README bullet; no normative spec change, spec body untouched
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Integrates the remote 'Update branch' merge (62368eb, main up to #1810) with the local newer-main merge (#1820). CHANGELOG resolved as a faithful union: all [Unreleased] Added entries kept, MCP extra-passthrough entry (#1670/#1765) appears exactly once. Denylist + tests preserved. Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sync the 800-line/complexity tightening branch with origin/main tip 788a09a (8 commits ahead of merge-base 45843c3): SBOM export + declared-license (#1820), dompurify bump (#1789), audit-unmanaged (#1793), ADO sourceBase (#1810), Antigravity target (#1770), marketplace token (#1763), spec-conformance (#1801), declared-license and integrity keys (#1794/#1777). Conflict resolution preserves the strangler-fig extraction: HEAD's relocations into sibling _*.py modules win, with main's feature additions folded into the new homes. Notable folds: - hook_merge.py: thread container key + antigravity dispatch. - audit: route fail_on_drift + LockFile through the audit module so test monkeypatches on apm_cli.commands.audit.* still take effect. Resolve merge-introduced CI regressions under the tightened gates: - ruff complexity: _classify_primitive_type (PLR0911), validate_policy (C901/PLR0912 via _validate_security), _audit_content_scan (PLR0912 via _run_drift_detection). - file-length <=800: split spdx_data.py (_spdx_exception_ids.py), policy_checks.py (_policy_checks_unmanaged.py), pack.py render helpers (into _pack_ops.py); all re-exported for the patch contract. Local CI mirror green: ruff check/format, pylint R0801 10/10, auth-signals, file-length<=800, full unit suite 17225 passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TL;DR
Adds
apm lock export --format cyclonedx|spdx(an SBOM/inventory export) and a declared-license recorder that follows the established manifest model: trust the manifest's declared license, never parse LICENSE text, judge nothing. Part of #1774 (issue #1777, work units U5 + U6).Problem (WHY)
APM is the install + integrity plane: it records WHAT reached disk and its provenance. The lockfile already pins commits and hashes, but there was no way to (a) export that inventory as a standard SBOM, or (b) record the license a package declares. This PR closes both gaps without crossing into the compliance plane -- APM still does not scan LICENSE text, conclude a license, or gate install on one.
Approach (WHAT)
U6 -- declared-license recorder (commit 1). Follows the established package-manifest model: record what the manifest declares (
apm.ymllicense:orplugin.jsonlicense), syntax-validate offline against a bundled SPDX id set, store nothing when undeclared. Three states, never collapsed:MIT,(MIT OR Apache-2.0))license.id/expressionUNLICENSED,SEE LICENSE IN <f>)license.name(named assertion)licenses[]omittedNOASSERTIONAuthoring asymmetry:
apm pack/apm publishWARN on a missing license in your ownapm.yml; install/export of others' deps stays SILENT.U5 -- SBOM export (commit 2).
apm lock export --format cyclonedx|spdxserializes the lockfile only -- never re-resolves, re-hashes, or touches the network/filesystem. Component identity is a purl (pkg:github/..,pkg:oci/..,pkg:generic/..). Output is deterministic (sorted by purl, pinned timestamp, stable keys) so two runs are byte-identical. Credentials in recorded URLs are scrubbed before emit.Implementation (HOW)
src/apm_cli/export/:spdx.py(offline classifier),spdx_data.py(bundled 729 SPDX ids + 85 exceptions, no dependency added),declared_license.py(resolve-time reader),authoring.py(warn hook),purl.py(identity + credential scrub),sbom.py(CycloneDX + SPDX serializers).LockedDependencygainsdeclared_license(omitted when None; no sentinel).lockcommand becomes a Click group withinvoke_without_command=Trueso bareapm lockstill resolves;exportis the new subcommand.flowchart LR A[apm.yml license: / plugin.json license] -->|resolve time| B[lockfile declared_license] B -->|apm lock export| C{format} C -->|cyclonedx| D[CycloneDX 1.5] C -->|spdx| E[SPDX 2.3]Trade-offs
apm sbomverb -- asbomverb would imply license/CVE completeness the lockfile lacks. The flag is honest about scope.LICENSEfile on disk never upgrades NOASSERTION. Deferred (v0.2):license_filepresence-provenance, LICENSE text, signatures (U7).Validation evidence
17251 passed, 2 skipped, 21 xfailed.ruff check,ruff format --check, pylint R0801 (10.00/10),lint-auth-signals.sh, file-length <= 2450, ASCII-only.How to test
Spec conformance (Mode B disposition)
This PR touches normative critical paths (
deps/lockfile.py,install/) but addsno new normative OpenAPM requirement. The declared-license recorder follows the established manifest model:
APM records a license only when the manifest declares one, omits it otherwise, and
never concludes, scans, or gates. The SBOM export is a read-only inventory serializer.
There is no new apm-policy MUST, so no
req-XXXanchor is appropriate -- minting onewould falsely assert a mandate the design deliberately avoids. Recording an auditable
waiver per CONTRIBUTING.md "Mode B (silent extension)":
apm-spec-waiver: additive provenance/export only -- the declared-license recorder records a license only when the package manifest declares one (it never concludes a license from LICENSE file text) and SBOM export is a read-only serializer of already-recorded lockfile fields; this PR introduces no new normative apm-policy MUST, so no req-XXX spec anchor applies.