Skip to content

Support Custom Artifactory Repositories for Package Metadata#258

Open
voidpetal wants to merge 7 commits into
aboutcode-org:mainfrom
voidpetal:fix-custom-artifactory-metadata
Open

Support Custom Artifactory Repositories for Package Metadata#258
voidpetal wants to merge 7 commits into
aboutcode-org:mainfrom
voidpetal:fix-custom-artifactory-metadata

Conversation

@voidpetal
Copy link
Copy Markdown

@voidpetal voidpetal commented Jan 21, 2026

When using --index-url with a custom Artifactory repository, dependency resolution works but the packages array comes back empty. This happens because get_pypi_data_from_purl() hardcodes https://pypi.org/pypi for the JSON API endpoint. Internal packages that don't exist on PyPI.org return 404 and are silently skipped.

The fix includes deriving the JSON API base URL from the provided repository instead of hardcoding PyPI.org
It is also necessary to match distribution files by filename (standardized per PEP 427/491) instead of full URL, since URL paths can differ between Simple API and JSON API endpoints.

@voidpetal voidpetal force-pushed the fix-custom-artifactory-metadata branch 2 times, most recently from 7bae318 to 2321963 Compare January 22, 2026 10:53
Copy link
Copy Markdown
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This PR really looks like it was in good part generated by a coding agent. I appreciate your efforts, but overall, do not force me to review code that you have not thoroughly reviewed yourself, otherwise you are effectively outsourcing AI generation review to me, a human, which could be considered bad form and not super polite or nice.

Time is a precious thing that is not a commodity for me. And reviewing AI slop is a waste of my time.

We are evolving our AI policy but at a high level:

  1. do not use AI to generate commit messages and PR bodies messages. These are used to communicate between people, so be nice and genuine: type these yourself with your own words. Here please amend your commit messages and PR body.
  2. adopt our style for commits messages and code. In particular for tests.
  3. we need doc and changelog updates
  4. do not add unused code. Remove it.

Comment thread tests/test_package_data.py Outdated
Comment thread tests/test_package_data.py Outdated
Comment thread tests/test_package_data.py Outdated
Comment thread tests/test_package_data.py Outdated
Comment thread src/python_inspector/package_data.py Outdated
Comment thread src/python_inspector/package_data.py Outdated
if repos:
# Convert to list if needed and use first repo's index_url
repos_list = list(repos) if not isinstance(repos, list) else repos
base_path = repos_list[0].index_url.replace("/simple", "/pypi")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing? this is weird: what if you have 10 repos?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I initially opted out for a half-solution to get the first repository. However now I updated it to try all available repositories until it hits a success. Let me know if this aligns with your expectations.

Comment thread src/python_inspector/package_data.py Outdated
Comment thread src/python_inspector/package_data.py Outdated
for url_entry in response.get("urls") or []:
url = url_entry.get("url")
if url:
# Resolve relative URLs (from Artifactory) to absolute URLs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have Artifactory examples? and what about nexus or others?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any public Artifactory URL to share freely.

Comment thread src/python_inspector/package_data.py Outdated

def get_file_match_key(url: str, sha256: Optional[str] = None) -> tuple:
"""
Extract a match key (filename, sha256) for comparing distribution files.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use our comment style: `Return ....

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was unused, sorry I overlooked it..

Comment thread src/python_inspector/package_data.py Outdated
continue

url_data = urls.get(dist_url)
url_data = urls_by_filename[filename]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that this will not fail?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, replaced with urls_by_filename.get(filename)

@voidpetal voidpetal force-pushed the fix-custom-artifactory-metadata branch 3 times, most recently from 3cafbe0 to 672f569 Compare March 11, 2026 10:35
@voidpetal
Copy link
Copy Markdown
Author

voidpetal commented Mar 11, 2026

Hi @pombredanne,

Thanks a lot for your review!
I agree with you and I sincerely apologize for making you feel your time is wasted. I will do my best to take into consideration your advice for the future contributions. Since this is my first contribution, I hope that you can forgive me for the oversight.

I have amended the PR and commit descriptions as you proposed and the amended code is ready for your further review.

Please let me know if you would like me to do anything else!

@voidpetal voidpetal marked this pull request as draft March 11, 2026 13:28
@voidpetal voidpetal force-pushed the fix-custom-artifactory-metadata branch from 672f569 to e5ddeeb Compare March 11, 2026 13:42
@voidpetal voidpetal marked this pull request as ready for review March 11, 2026 13:54
@voidpetal voidpetal changed the title Support Custom PyPI-Compatible Repositories for Package Metadata Support Custom Artifactory Repositories for Package Metadata Mar 11, 2026
@voidpetal voidpetal force-pushed the fix-custom-artifactory-metadata branch 3 times, most recently from 40dbfa4 to 9fdb092 Compare March 12, 2026 07:50
@pombredanne
Copy link
Copy Markdown
Member

@voidpetal we have now #261 that was merged, which has a similar goal. There are a few issues in both cases, as there are cases where #261 and this code here may not work all the times. In particular, Artifactory is peculiar in that it does not support the simple API, but Nexus does, and of course PyPI does too. We will need to refine all that with proper tests as we cannot assume that we have to always a (slow) pypi/ endpoint over a a (faster) simple/ endpoint, especially when not using Artifactory

@voidpetal
Copy link
Copy Markdown
Author

Hi @pombredanne, thanks for letting me know! I checked and it seems that with the new release our issues were resolved. So I am fine also closing this PR.

Let me know if there is something you'd like to keep from this PR.

@voidpetal voidpetal force-pushed the fix-custom-artifactory-metadata branch from 9fdb092 to 0d45a34 Compare May 28, 2026 13:41
voidpetal added 7 commits May 28, 2026 16:13
Artifactory URLs have different path structures than PyPI URLs, but
filenames are identical. When exact URL match fails, fall back to
matching by filename to ensure packages array is populated.

Signed-off-by: Kai Hodžić <hodzic.e.k@outlook.com>
Artifactory might return `project_urls: null` even for public packages.
Now iterates through all index_urls and falls back to PyPI.org to get
VCS metadata when Artifactory has incomplete data.

Signed-off-by: Kai Hodžić <hodzic.e.k@outlook.com>
This helps ORT resolve VCS provenance when artifact download fails.
Populated from code_view_url

Signed-off-by: Kai Hodžić <hodzic.e.k@outlook.com>
Extract sdist URL, hash, and filename from PyPI response and include
in extra_data.source_artifact for downstream consumers that need
source distribution information.

Signed-off-by: Kai Hodžić <hodzic.e.k@outlook.com>
Signed-off-by: Kai Hodžić <hodzic.e.k@outlook.com>
Signed-off-by: Kai Hodžić <hodzic.e.k@outlook.com>
Signed-off-by: Kai Hodžić <hodzic.e.k@outlook.com>
@voidpetal voidpetal force-pushed the fix-custom-artifactory-metadata branch from 8824f77 to 9b10285 Compare May 28, 2026 14:18
@voidpetal
Copy link
Copy Markdown
Author

Hi @pombredanne, actually the issue wasn't fully resolved, but I updated the PR with a fix which resolves the missing metadata issue for us.

In total the PR contains the following fixes:

  1. URL filename matching - Artifactory URLs differ from PyPI URLs (different path structure), but filenames are identical. Now we fallback to filename matching when exact URL match fails, otherwise packages array comes back empty.
  2. VCS URL extraction - vcs_url is now populated from code_view_url. ORT tries both [ARTIFACT, VCS] for provenance resolution, so when artifact fails it falls back to VCS. Without vcs_url, both paths fail.
  3. Artifactory project_urls fallback - Artifactory might return project_urls: null even for public packages. We now continue to PyPI.org to get VCS metadata when Artifactory has incomplete data.
  4. Source artifact metadata - Added extra_data.source_artifact with sdist URL/hash for downstream consumers.

Please let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants