feat(sbom): build purls with packageurl-python#1032
feat(sbom): build purls with packageurl-python#1032mprpic wants to merge 1 commit intopython-wheel-build:mainfrom
Conversation
3b36a55 to
c6a45bf
Compare
📝 WalkthroughWalkthroughThis PR introduces a structured PURL configuration and upstream-source tracking to SBOM generation. A new Pydantic model Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fromager/sbom.py`:
- Around line 62-86: The upstream derivation ignores downstream string purl
overrides because pc is None; modify _build_upstream_purl to detect when the
downstream purl was provided as a full string (use the same logic/path as
_build_downstream_purl to obtain the effective downstream purl), parse it into a
PackageURL, and derive the upstream from that parsed PackageURL (preserving
type/namespace/name/version but omitting qualifiers like repository_url) instead
of falling back to sbom_settings defaults; reference _build_upstream_purl,
_build_downstream_purl, pbi.purl_config, and PackageURL to locate where to parse
the downstream string and build the upstream.
In `@tests/test_sbom.py`:
- Around line 90-104: Update the test_generate_sbom_full_purl_override test to
also assert the upstream purl is derived from the override scheme rather than
defaulting to the pypi purl: after calling sbom.generate_sbom and selecting
wheel = doc["packages"][0], add an assertion that the upstream externalRef
referenceLocator equals "pkg:generic/test-pkg@1.0.0" (i.e. the override scheme +
the original package name/version). This targets the behavior in
sbom.generate_sbom when package_overrides contains a full purl string.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 75666aca-27d6-45e9-9049-6096c4edd405
📒 Files selected for processing (7)
pyproject.tomlsrc/fromager/packagesettings/__init__.pysrc/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_pbi.pysrc/fromager/sbom.pytests/conftest.pytests/test_sbom.py
c6a45bf to
ad732c7
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_sbom.py (1)
116-132: Consider asserting upstream purl for completeness.This test verifies individual field overrides affect the downstream purl, but doesn't assert the upstream derivation. Adding one line would strengthen coverage:
wheel = doc["packages"][0] + upstream = doc["packages"][1] assert wheel["externalRefs"][0]["referenceLocator"] == ( "pkg:generic/custom-name@1.0.0" ) + assert upstream["externalRefs"][0]["referenceLocator"] == ( + "pkg:generic/custom-name@1.0.0" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_sbom.py` around lines 116 - 132, Add an assertion in test_generate_sbom_purl_field_overrides to also verify the upstream/original purl is preserved; after obtaining wheel = doc["packages"][0], assert that the upstream purl equals the expected original purl derived from Requirement("test-pkg==1.0.0") (e.g., "pkg:pypi/test-pkg@1.0.0") so the test checks both the overridden externalRefs value and the original upstream purl in the generated SBOM.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_sbom.py`:
- Around line 116-132: Add an assertion in
test_generate_sbom_purl_field_overrides to also verify the upstream/original
purl is preserved; after obtaining wheel = doc["packages"][0], assert that the
upstream purl equals the expected original purl derived from
Requirement("test-pkg==1.0.0") (e.g., "pkg:pypi/test-pkg@1.0.0") so the test
checks both the overridden externalRefs value and the original upstream purl in
the generated SBOM.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bf06ffce-1dcc-4aee-b3df-656a23d42408
📒 Files selected for processing (7)
pyproject.tomlsrc/fromager/packagesettings/__init__.pysrc/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_pbi.pysrc/fromager/sbom.pytests/conftest.pytests/test_sbom.py
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (3)
- src/fromager/packagesettings/init.py
- tests/conftest.py
- src/fromager/packagesettings/_models.py
|
For markupsafe in builder, this now generates: {
"spdxVersion": "SPDX-2.3",
"dataLicense": "CC0-1.0",
"SPDXID": "SPDXRef-DOCUMENT",
"name": "markupsafe-3.0.3",
"documentNamespace": "https://www.redhat.com/markupsafe-3.0.3.spdx.json",
"creationInfo": {
"created": "2026-04-07T20:12:54Z",
"creators": [
"Organization: Red Hat",
"Tool: fromager-0.1.dev2074+gad732c737"
]
},
"packages": [
{
"SPDXID": "SPDXRef-wheel",
"name": "markupsafe",
"versionInfo": "3.0.3",
"downloadLocation": "NOASSERTION",
"supplier": "Organization: Red Hat",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceType": "purl",
"referenceLocator": "pkg:pypi/markupsafe@3.0.3?repository_url=https://packages.redhat.com"
}
]
},
{
"SPDXID": "SPDXRef-upstream",
"name": "markupsafe",
"versionInfo": "3.0.3",
"downloadLocation": "NOASSERTION",
"supplier": "NOASSERTION",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceType": "purl",
"referenceLocator": "pkg:pypi/markupsafe@3.0.3"
}
]
}
],
"relationships": [
{
"spdxElementId": "SPDXRef-DOCUMENT",
"relationshipType": "DESCRIBES",
"relatedSpdxElement": "SPDXRef-wheel"
},
{
"spdxElementId": "SPDXRef-wheel",
"relationshipType": "GENERATED_FROM",
"relatedSpdxElement": "SPDXRef-upstream"
}
]
}with: |
ad732c7 to
de64f57
Compare
|
After another round of reviews, I decided to remove the option of setting the purl as a string (with interpolated values for name and version) in per-package settings and only support the purl object. Overriding the individual fields using the object is easy enough without having to support two methods for doing it. The latest commit also refactors the test data and thus fixes #1042. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/fromager/packagesettings/_models.py (1)
72-117:⚠️ Potential issue | 🟠 MajorRestore the raw-string
purloverride mode.
PackageSettings.purlnow only acceptsPurlConfig, so a package override likepurl: "pkg:github/org/repo@1.0.0"will fail validation. That makes the full per-package override path from the PR design unreachable, and the downstream cascade can only ever use field overrides plus globals. Please keeppurlaccepting either a full string or structured overrides, and enforce those two modes as mutually exclusive in a model validator. Add a regression test for the string form as part of the fix.Also applies to: 414-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fromager/packagesettings/_models.py` around lines 72 - 117, PackageSettings currently only accepts PurlConfig so restore the dual-mode by making the PackageSettings.purl field accept either a raw purl string or a PurlConfig instance (e.g. Union[str, PurlConfig]) and add a model validator on PackageSettings (e.g. validate_purl or root_validator) that enforces the two modes are mutually exclusive and normalizes/validates the value (reject combinations of a raw string with other purl override fields); update or add a regression test asserting that a package override like purl: "pkg:github/org/repo@1.0.0" validates successfully and that mixed forms are rejected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fromager/sbom.py`:
- Around line 139-150: upstream_entry currently reuses the downstream wheel's
name/version when building the upstream SPDX entry; change it so
upstream_entry["name"] and ["versionInfo"] are taken from the effective upstream
identity (the resolved upstream package identity in purl.upstream /
upstream_purl) instead of the wheel metadata. In practice: detect if
purl.upstream is present (or if an upstream identity variable exists), extract
the upstream name and version from that identity and assign those to
upstream_entry["name"] and upstream_entry["versionInfo"], falling back to the
existing name/version only when no upstream identity is available; keep
upstream_entry["externalRefs"][...]["referenceLocator"] as upstream_purl.
---
Duplicate comments:
In `@src/fromager/packagesettings/_models.py`:
- Around line 72-117: PackageSettings currently only accepts PurlConfig so
restore the dual-mode by making the PackageSettings.purl field accept either a
raw purl string or a PurlConfig instance (e.g. Union[str, PurlConfig]) and add a
model validator on PackageSettings (e.g. validate_purl or root_validator) that
enforces the two modes are mutually exclusive and normalizes/validates the value
(reject combinations of a raw string with other purl override fields); update or
add a regression test asserting that a package override like purl:
"pkg:github/org/repo@1.0.0" validates successfully and that mixed forms are
rejected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 942b260b-621c-4251-b5c2-2bff7a203110
📒 Files selected for processing (9)
pyproject.tomlsrc/fromager/packagesettings/__init__.pysrc/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_pbi.pysrc/fromager/sbom.pytests/conftest.pytests/test_sbom.pytests/testdata/sbom-context/overrides/settings.yamltests/testdata/sbom-context/overrides/settings/test_purl_fields.yaml
✅ Files skipped from review due to trivial changes (4)
- tests/testdata/sbom-context/overrides/settings.yaml
- tests/testdata/sbom-context/overrides/settings/test_purl_fields.yaml
- pyproject.toml
- src/fromager/packagesettings/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/conftest.py
- src/fromager/packagesettings/_pbi.py
…e identification Use the packageurl-python library to construct purls instead of manual string building. Introduce a PurlConfig model that consolidates all purl-related per-package settings into a single field. When set to a string, it is used as the full downstream purl. When set to a PurlConfig object, individual fields (type, namespace, name, version, repository_url, upstream) override specific purl components while defaulting the rest from global SbomSettings. Add upstream source identification to the SBOM. Each document now contains two package entries linked by a GENERATED_FROM relationship: - SPDXRef-wheel: the downstream wheel with repository_url qualifier - SPDXRef-upstream: the original source package without qualifiers The upstream purl is auto-derived by stripping repository_url from the downstream purl. For packages sourced from GitHub/GitLab, an explicit upstream purl can be set via PurlConfig.upstream. Add repository_url to SbomSettings as a global purl qualifier (e.g. ?repository_url=https://packages.redhat.com) added to every downstream purl. Per-package PurlConfig.repository_url overrides it. Refactored tests to use static test data for both global and per-package settings files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Martin Prpič <mprpic@redhat.com>
de64f57 to
cd76fa5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_sbom.py (1)
173-190: Avoid making relationship order part of the contract.This test is stricter than the feature requires. A harmless reordering of
relationshipswould fail it even if the SPDX content is still correct.Suggested assertion
- rels = doc["relationships"] - assert len(rels) == 2 - assert rels[0]["spdxElementId"] == "SPDXRef-DOCUMENT" - assert rels[0]["relationshipType"] == "DESCRIBES" - assert rels[0]["relatedSpdxElement"] == "SPDXRef-wheel" - assert rels[1]["spdxElementId"] == "SPDXRef-wheel" - assert rels[1]["relationshipType"] == "GENERATED_FROM" - assert rels[1]["relatedSpdxElement"] == "SPDXRef-upstream" + assert { + ( + rel["spdxElementId"], + rel["relationshipType"], + rel["relatedSpdxElement"], + ) + for rel in doc["relationships"] + } == { + ("SPDXRef-DOCUMENT", "DESCRIBES", "SPDXRef-wheel"), + ("SPDXRef-wheel", "GENERATED_FROM", "SPDXRef-upstream"), + } _validate_spdx(doc)As per coding guidelines: "
tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_sbom.py` around lines 173 - 190, The test test_generate_sbom_relationships currently assumes a stable order in doc["relationships"] which is fragile; change it to assert relationships irrespective of order by locating entries by their relationshipType or spdxElementId (e.g., find the item where relationshipType == "DESCRIBES" and assert its spdxElementId == "SPDXRef-DOCUMENT" and relatedSpdxElement == "SPDXRef-wheel", and find the item where relationshipType == "GENERATED_FROM" and assert its spdxElementId == "SPDXRef-wheel" and relatedSpdxElement == "SPDXRef-upstream") or by comparing a set of expected (spdxElementId, relationshipType, relatedSpdxElement) tuples against the actual relationships; update test_generate_sbom_relationships to use these order-independent checks and keep the existing call to _validate_spdx(doc).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_sbom.py`:
- Around line 62-78: The current test
test_generate_sbom_repository_url_qualifier only covers the global
repository_url qualifier and one downstream override; add assertions and/or an
additional test case that covers (1) a per-package repository_url override in
the input package metadata (use make_sbom_ctx to create a package entry with
per-package repository_url) so that the downstream package purl uses the
package-level repository_url instead of the global SbomSettings.repository_url,
and (2) verify that any field overrides (e.g., repository_url set on the
downstream/package entry) are propagated into the derived upstream package purl
produced by sbom.generate_sbom (inspect doc["packages"] entries for both
downstream and upstream and assert their externalRefs[*].referenceLocator values
reflect the per-package override). Use the same helpers (Requirement, Version,
make_sbom_ctx, SbomSettings, sbom.generate_sbom) and _validate_spdx to keep
consistency.
---
Nitpick comments:
In `@tests/test_sbom.py`:
- Around line 173-190: The test test_generate_sbom_relationships currently
assumes a stable order in doc["relationships"] which is fragile; change it to
assert relationships irrespective of order by locating entries by their
relationshipType or spdxElementId (e.g., find the item where relationshipType ==
"DESCRIBES" and assert its spdxElementId == "SPDXRef-DOCUMENT" and
relatedSpdxElement == "SPDXRef-wheel", and find the item where relationshipType
== "GENERATED_FROM" and assert its spdxElementId == "SPDXRef-wheel" and
relatedSpdxElement == "SPDXRef-upstream") or by comparing a set of expected
(spdxElementId, relationshipType, relatedSpdxElement) tuples against the actual
relationships; update test_generate_sbom_relationships to use these
order-independent checks and keep the existing call to _validate_spdx(doc).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f25aa677-cae5-484c-8b04-8fd9e690c85e
📒 Files selected for processing (9)
pyproject.tomlsrc/fromager/packagesettings/__init__.pysrc/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_pbi.pysrc/fromager/sbom.pytests/conftest.pytests/test_sbom.pytests/testdata/sbom-context/overrides/settings.yamltests/testdata/sbom-context/overrides/settings/test_purl_fields.yaml
✅ Files skipped from review due to trivial changes (4)
- pyproject.toml
- tests/testdata/sbom-context/overrides/settings/test_purl_fields.yaml
- tests/testdata/sbom-context/overrides/settings.yaml
- src/fromager/packagesettings/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/conftest.py
- src/fromager/packagesettings/_models.py
| def test_generate_sbom_repository_url_qualifier(tmp_path: pathlib.Path) -> None: | ||
| """Verify global repository_url adds qualifier to downstream but not upstream.""" | ||
| settings = SbomSettings(repository_url="https://packages.redhat.com") | ||
| ctx = make_sbom_ctx(tmp_path, sbom_settings=settings) | ||
| doc = sbom.generate_sbom( | ||
| ctx=ctx, | ||
| req=Requirement("numpy==1.26.0"), | ||
| version=Version("1.26.0"), | ||
| ) | ||
|
|
||
| wheel = doc["packages"][0] | ||
| upstream = doc["packages"][1] | ||
| assert wheel["externalRefs"][0]["referenceLocator"] == ( | ||
| "pkg:pypi/numpy@1.26.0?repository_url=https://packages.redhat.com" | ||
| ) | ||
| assert upstream["externalRefs"][0]["referenceLocator"] == "pkg:pypi/numpy@1.26.0" | ||
| _validate_spdx(doc) |
There was a problem hiding this comment.
Cover the remaining override branches.
These tests prove the global qualifier path and one downstream field override, but they still miss two core cases from this PR: per-package repository_url overriding the global value, and field overrides carrying through to the derived upstream purl. Both could regress without failing the suite.
Suggested test additions
def test_generate_sbom_purl_field_overrides(tmp_path: pathlib.Path) -> None:
"""Verify individual purl field overrides work."""
ctx = make_sbom_ctx(
tmp_path,
sbom_settings=SbomSettings(),
package_overrides={"purl": {"type": "generic", "name": "custom-name"}},
)
doc = sbom.generate_sbom(
ctx=ctx,
req=Requirement("test-pkg==1.0.0"),
version=Version("1.0.0"),
)
wheel = doc["packages"][0]
+ upstream = doc["packages"][1]
assert wheel["externalRefs"][0]["referenceLocator"] == (
"pkg:generic/custom-name@1.0.0"
)
+ assert upstream["externalRefs"][0]["referenceLocator"] == (
+ "pkg:generic/custom-name@1.0.0"
+ )
_validate_spdx(doc)
+
+
+def test_generate_sbom_package_repository_url_override(tmp_path: pathlib.Path) -> None:
+ """Verify per-package repository_url overrides the global value."""
+ ctx = make_sbom_ctx(
+ tmp_path,
+ sbom_settings=SbomSettings(repository_url="https://packages.redhat.com"),
+ package_overrides={
+ "purl": {"repository_url": "https://mirror.example.com/simple"},
+ },
+ )
+ doc = sbom.generate_sbom(
+ ctx=ctx,
+ req=Requirement("numpy==1.26.0"),
+ version=Version("1.26.0"),
+ )
+
+ wheel = doc["packages"][0]
+ upstream = doc["packages"][1]
+ assert wheel["externalRefs"][0]["referenceLocator"] == (
+ "pkg:pypi/numpy@1.26.0?repository_url=https://mirror.example.com/simple"
+ )
+ assert upstream["externalRefs"][0]["referenceLocator"] == "pkg:pypi/numpy@1.26.0"As per coding guidelines: "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."
Also applies to: 106-123
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_sbom.py` around lines 62 - 78, The current test
test_generate_sbom_repository_url_qualifier only covers the global
repository_url qualifier and one downstream override; add assertions and/or an
additional test case that covers (1) a per-package repository_url override in
the input package metadata (use make_sbom_ctx to create a package entry with
per-package repository_url) so that the downstream package purl uses the
package-level repository_url instead of the global SbomSettings.repository_url,
and (2) verify that any field overrides (e.g., repository_url set on the
downstream/package entry) are propagated into the derived upstream package purl
produced by sbom.generate_sbom (inspect doc["packages"] entries for both
downstream and upstream and assert their externalRefs[*].referenceLocator values
reflect the per-package override). Use the same helpers (Requirement, Version,
make_sbom_ctx, SbomSettings, sbom.generate_sbom) and _validate_spdx to keep
consistency.
Pull Request Description
What
Use the packageurl-python library to construct purls instead of manual string building. Support two modes for purl specification in package settings:
purlfield) used as-is, e.g.pkg:generic/my-fork@1.0.0purl_type,purl_namespace,purl_name,purl_version) that override the default construction from global settings and package identityThe two modes are mutually exclusive, enforced by a model validator. A default purl (
pkg:pypi/<name>@<version>) is now always generated. The globalpurl_typesetting (default:pypi) controls the default type for all packages.Add
repository_urltoSbomSettingsas a global purl qualifier (e.g.?repository_url=https://packages.redhat.com) that is added to every downstream purl. Per-packagerepository_urloverrides the global value.Downstream purl construction cascades:
per-package
purl(full override) >per-package field overrides (
purl_type, etc.) >global defaults from
SbomSettingsWhy
This allows us to define relationships between upstream and downstream components as well as have a better way to control how the downstream purl is constructed (in case our wheel has a different name, version, or other attributes). See also #1031 for more examples.
Closes #1031