Skip to content

Conversation

@ilopezluna
Copy link
Contributor

These PR includes 3 changes:

  1. Added a test to pull a model from Docker Hub registry as the integration tests we had didn't catch a pulling issue in PR to remove ggcr.
    The error was not catch because the local test registry (registry:3) doesn't need hostname remapping - registry.local:5000 works directly. However Docker Hub's actual V2 API is served from registry-1.docker.io so we need this remapping (which was automatically handled by ggcr).
  2. Fixes pulling from OCI registries by adding the mapping of the registry
  3. Fixes pulling from HF by using the /manifests/ endpoint instead of /blob/ (which is not implemented in HF):

For example if we try to get the manifest of qwen3 via /blobs/ endpoint:

curl -s -H "Accept: application/vnd.oci.image.manifest.v1+json" \
  "https://huggingface.co/v2/unsloth/Qwen3-0.6B-GGUF/manifests/latest" | sha256sum
2c90981467c21b4f872acc2b537bc8e1d4c30f07c8efb3a5981a3e73153eb99c  -

We get a 404:

curl -I "https://huggingface.co/v2/unsloth/Qwen3-0.6B-GGUF/blobs/sha256:2c90981467c21b4f872acc2b537bc8e1d4c30f07c8efb3a5981a3e73153eb99c"
HTTP/2 404

Via /manifests/ works well:

curl -I -H "Accept: application/vnd.oci.image.manifest.v1+json" \
  "https://huggingface.co/v2/unsloth/Qwen3-0.6B-GGUF/manifests/latest"
HTTP/2 200 

And any layer via blob also works:

curl -I "https://huggingface.co/v2/unsloth/Qwen3-0.6B-GGUF/blobs/sha256:4e0cc56d924a70ceac12429e716f1da1db9fe6ac870fdef0ced55abc6d3d40d4"
HTTP/2 200 

@ilopezluna ilopezluna requested a review from a team January 8, 2026 11:32
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial fixes and a new integration test to restore and ensure proper model pulling functionality from various OCI-compliant registries, particularly Docker Hub and Hugging Face, following the removal of the ggcr library. The changes address hostname remapping for Docker Hub and adapt manifest fetching strategies for Hugging Face, which uses a different API endpoint for manifest-like content.

Highlights

  • New Integration Test for Docker Hub Pulls: A new integration test (TestIntegration_PullFromDockerHub) has been added to verify that pulling models from Docker Hub works correctly. This test specifically addresses issues related to hostname remapping that were not caught by existing local registry tests, ensuring compatibility with registry-1.docker.io.
  • Fix for OCI Registry Hostname Remapping: The system now correctly remaps Docker Hub hostnames (e.g., docker.io, index.docker.io) to the actual V2 API endpoint registry-1.docker.io. This resolves issues encountered when pulling from OCI registries like Docker Hub after the removal of ggcr.
  • Fix for Hugging Face Manifest Pulls: The manifest fetching logic has been updated to correctly interact with Hugging Face registries. It now prioritizes the /manifests/ endpoint for all JSON-like content, as Hugging Face does not implement the /blobs/ endpoint for manifest-like data. Additionally, when fetching manifests via this endpoint, it now explicitly requests the 'latest' tag.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In manifestFetcher.Fetch, the manifest URL now hardcodes "latest" instead of using desc.Digest.String(), which breaks content-addressable behavior and will fail for non-latest tags or digest-based pulls; this should continue to use the descriptor’s digest (or tag when appropriate) rather than a constant.
  • The new TestIntegration_PullFromDockerHub relies on the real Docker Hub and a specific public image, which may introduce flakiness or rate-limit issues in CI; consider adding a way to skip or gate this test in constrained environments (e.g., via a build tag or env flag).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `manifestFetcher.Fetch`, the manifest URL now hardcodes `"latest"` instead of using `desc.Digest.String()`, which breaks content-addressable behavior and will fail for non-`latest` tags or digest-based pulls; this should continue to use the descriptor’s digest (or tag when appropriate) rather than a constant.
- The new `TestIntegration_PullFromDockerHub` relies on the real Docker Hub and a specific public image, which may introduce flakiness or rate-limit issues in CI; consider adding a way to skip or gate this test in constrained environments (e.g., via a build tag or env flag).

## Individual Comments

### Comment 1
<location> `pkg/distribution/oci/remote/remote.go:383-387` </location>
<code_context>
+	// Docker Hub uses several aliases but the V2 API is served from registry-1.docker.io
+	registryHost := dockerHubAPIHost(registry.RegistryStr())
+
 	url := fmt.Sprintf("%s://%s/v2/%s/manifests/%s",
 		scheme,
-		registry.RegistryStr(),
+		registryHost,
 		repo,
-		desc.Digest.String())
+		"latest")

 	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
</code_context>

<issue_to_address>
**issue (bug_risk):** Using the hard-coded "latest" tag instead of the descriptor digest is likely incorrect and breaks content-addressable fetching.

This previously used `desc.Digest.String()` to align with the `/manifests/<digest>` behavior described above. Switching to the hard-coded `"latest"` ignores the requested descriptor and always fetches the latest tag, which can return the wrong manifest, cause cache inconsistencies, and make behavior non-deterministic. Unless there’s a strong requirement to use tags here, this should keep using `desc.Digest.String()` (or another identifier derived from `desc`) instead of a fixed tag.
</issue_to_address>

### Comment 2
<location> `cmd/cli/commands/integration_test.go:1188-1194` </location>
<code_context>
+	}
+}
+
+// TestIntegration_PullFromDockerHub is a smoke test that pulls a real model
+// from Docker Hub to verify that the OCI registry code works correctly
+// with the real Docker Hub registry (index.docker.io -> registry-1.docker.io).
+//
+// This test catches regressions where the code doesn't properly handle
+// Docker Hub's hostname remapping requirements.
+func TestIntegration_PullFromDockerHub(t *testing.T) {
+	env := setupDockerHubTestEnv(t)
+
</code_context>

<issue_to_address>
**suggestion (testing):** Consider gating this Docker Hub integration test behind an opt-in flag or build tag to avoid flakiness and external dependency issues in CI.

Because this test relies on external network access and Docker Hub availability (`ai/smollm2:135M-Q4_0` staying public, rate limits, outages), it can make CI flaky or unusable in restricted environments. Options:
- Skip by default unless an env var is set (e.g. `if os.Getenv("DMR_DOCKER_HUB_TEST") != "1" { t.Skip(...) }`).
- Guard it with a build tag like `//go:build integration` so it only runs in an integration pipeline.
This keeps the regression coverage while ensuring the default test suite remains stable and deterministic.
</issue_to_address>

### Comment 3
<location> `cmd/cli/commands/integration_test.go:1212-1216` </location>
<code_context>
+	err = pullModel(newPullCmd(), env.client, modelRef)
+	require.NoError(t, err, "Failed to pull model from Docker Hub: %s", modelRef)
+
+	// Verify the model was pulled
+	t.Log("Verifying model was pulled successfully")
+	models, err = listModels(false, env.client, true, false, "")
+	require.NoError(t, err)
+	require.NotEmpty(t, strings.TrimSpace(models), "Model should exist after pull from Docker Hub")
+
+	// Verify we can inspect the model
</code_context>

<issue_to_address>
**suggestion (testing):** Strengthen the assertion to verify the pulled model reference matches the expected Docker Hub-normalized form, not just that some model exists.

Currently this only verifies that *some* model string is returned, so it would still pass if a different registry or tag were used. To actually validate the hostname remapping behavior, assert that the normalized reference you expect appears in the output, e.g.

```go
require.Contains(t, models, "docker.io/ai/smollm2:135M-Q4_0")
```
(adjusted to match what `normalizeRef` produces).

```suggestion
	// Verify the model was pulled
	t.Log("Verifying model was pulled successfully")
	models, err = listModels(false, env.client, true, false, "")
	require.NoError(t, err)

	// Strengthen assertion: ensure the normalized Docker Hub reference is present
	normalizedRef := "docker.io/" + modelRef
	require.Contains(t, models, normalizedRef, "Expected pulled model %s to appear in model list output", normalizedRef)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ilopezluna ilopezluna marked this pull request as draft January 8, 2026 11:34
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces important fixes for pulling models from Docker Hub and HuggingFace registries after the removal of ggcr. The addition of an integration test for Docker Hub is a valuable improvement to catch registry-specific issues. The refactoring of the test setup to accommodate different container configurations is also well-executed.

The changes in pkg/distribution/oci/remote/remote.go correctly implement hostname remapping for Docker Hub. However, I've identified a critical issue where the manifest fetch logic has been hardcoded to use the "latest" tag. This will break the functionality for pulling any model version that is not tagged as latest, as well as pulling by digest. My review includes a suggestion to correct this behavior.

@ilopezluna ilopezluna marked this pull request as ready for review January 8, 2026 11:44
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In manifestFetcher.Fetch, the new HuggingFace logic (shouldUseManifestEndpoint) now routes any JSON-like descriptor through the /manifests/ path and hardcodes latest, which breaks digest-based fetches and non-manifest JSON layers; consider keeping the original isManifestMediaType gate and using desc.Digest.String() instead of the literal "latest".
  • The Docker Hub hostname remapping via dockerHubAPIHost is only applied in the /manifests/ code path; if there are other calls that still use registry.RegistryStr() directly (e.g., blob fetches), ensure they also apply the same mapping if needed to avoid inconsistent behavior between manifest and blob requests.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `manifestFetcher.Fetch`, the new HuggingFace logic (`shouldUseManifestEndpoint`) now routes any JSON-like descriptor through the `/manifests/` path and hardcodes `latest`, which breaks digest-based fetches and non-manifest JSON layers; consider keeping the original `isManifestMediaType` gate and using `desc.Digest.String()` instead of the literal `"latest"`.
- The Docker Hub hostname remapping via `dockerHubAPIHost` is only applied in the `/manifests/` code path; if there are other calls that still use `registry.RegistryStr()` directly (e.g., blob fetches), ensure they also apply the same mapping if needed to avoid inconsistent behavior between manifest and blob requests.

## Individual Comments

### Comment 1
<location> `pkg/distribution/oci/remote/remote.go:361-364` </location>
<code_context>
+
+	// For HuggingFace, try /manifests/ first for any JSON-like content
+	// since they don't serve manifests via /blobs/ at all
+	shouldUseManifestEndpoint := isHF && (desc.MediaType == "application/json" || strings.Contains(desc.MediaType, "+json"))
+
+	// For non-manifest content on non-HF registries, use the underlying fetcher
+	if !shouldUseManifestEndpoint {
 		return f.underlying.Fetch(ctx, desc)
 	}
</code_context>

<issue_to_address>
**issue (bug_risk):** Manifest fetches for non-HuggingFace registries no longer use the /manifests/ endpoint.

Previously, `Fetch` treated any `isManifestMediaType(desc.MediaType)` as a `/manifests/` request across all registries, which allowed non-HF registries that don’t expose manifests via `/blobs/` to work. Now only HF + JSON-like types use `/manifests/`, and all other manifest media types fall back to `/blobs/`, which appears to regress that behavior. Please keep the old `isManifestMediaType` behavior for non-HF registries and add the HF special-case on top, e.g.:

```go
shouldUseManifestEndpoint := (isHF && jsonLike) || (!isHF && isManifestMediaType(desc.MediaType))
```
</issue_to_address>

### Comment 2
<location> `pkg/distribution/oci/remote/remote.go:382-386` </location>
<code_context>
+	// Docker Hub uses several aliases but the V2 API is served from registry-1.docker.io
+	registryHost := dockerHubAPIHost(registry.RegistryStr())
+
 	url := fmt.Sprintf("%s://%s/v2/%s/manifests/%s",
 		scheme,
-		registry.RegistryStr(),
+		registryHost,
 		repo,
-		desc.Digest.String())
+		"latest")

 	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
</code_context>

<issue_to_address>
**issue (bug_risk):** Using "latest" instead of the descriptor digest breaks content-addressable fetching.

Previously this URL used `desc.Digest.String()`, guaranteeing we fetch the exact manifest referenced by the descriptor. Hardcoding `"latest"` means:

- We may fetch a different manifest than the descriptor identifies.
- Digest-based or non-`latest` references won’t work correctly.
- The `Fetch(ctx, desc)` contract of being content-addressable is broken.

Unless this function is intentionally tag-based and always meant to resolve `latest`, it should derive the reference from `desc`/`f.ref` instead of hardcoding a tag.
</issue_to_address>

### Comment 3
<location> `pkg/distribution/oci/remote/remote.go:347-348` </location>
<code_context>
+
+// isHuggingFaceRegistry returns true if the host is a HuggingFace registry.
+// HuggingFace doesn't serve manifests via /blobs/ endpoint, only via /manifests/.
+func isHuggingFaceRegistry(host string) bool {
+	return strings.Contains(host, "huggingface.co") || strings.Contains(host, "hf.co")
+}
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** HuggingFace host detection via substring matching may overmatch unintended hosts.

Because this uses `strings.Contains`, a host like `myhuggingface.co.evil.com` or `foo-hf.co.example.org` would be treated as HuggingFace, which could incorrectly send requests through the HuggingFace-specific path. Consider matching only exact hosts and valid subdomains instead, e.g. `host == "huggingface.co" || strings.HasSuffix(host, ".huggingface.co")` (and similarly for `hf.co`).

Suggested implementation:

```golang
 // isHuggingFaceRegistry returns true if the host is a HuggingFace registry.
 // HuggingFace doesn't serve manifests via /blobs/ endpoint, only via /manifests/.
 func isHuggingFaceRegistry(host string) bool {
 	if host == "huggingface.co" || strings.HasSuffix(host, ".huggingface.co") {
 		return true
 	}
 	if host == "hf.co" || strings.HasSuffix(host, ".hf.co") {
 		return true
 	}
 	return false
 }

 // dockerHubAPIHost remaps Docker Hub registry hostnames to the actual API endpoint.
 // Docker Hub uses several aliases (docker.io, index.docker.io) but the actual
 // V2 API is served from registry-1.docker.io.
 func dockerHubAPIHost(host string) string {
 	switch host {
 	case "docker.io", "index.docker.io":
 		return "registry-1.docker.io"
 	default:
 		return host
 	}
 }

```

`).

Here are the code changes:

<file_operations>
<file_operation operation="edit" file_path="pkg/distribution/oci/remote/remote.go">
<<<<<<< SEARCH
// isHuggingFaceRegistry returns true if the host is a HuggingFace registry.
// HuggingFace doesn't serve manifests via /blobs/ endpoint, only via /manifests/.

// dockerHubAPIHost remaps Docker Hub registry hostnames to the actual API endpoint.
// Docker Hub uses several aliases (docker.io, index.docker.io) but the actual
// V2 API is served from registry-1.docker.io.
func dockerHubAPIHost(host string) string {
	switch host {
	case "docker.io", "index.docker.io":
		return "registry-1.docker.io"
	default:
		return host
	}
}
=======
 // isHuggingFaceRegistry returns true if the host is a HuggingFace registry.
 // HuggingFace doesn't serve manifests via /blobs/ endpoint, only via /manifests/.
 func isHuggingFaceRegistry(host string) bool {
 	if host == "huggingface.co" || strings.HasSuffix(host, ".huggingface.co") {
 		return true
 	}
 	if host == "hf.co" || strings.HasSuffix(host, ".hf.co") {
 		return true
 	}
 	return false
 }

 // dockerHubAPIHost remaps Docker Hub registry hostnames to the actual API endpoint.
 // Docker Hub uses several aliases (docker.io, index.docker.io) but the actual
 // V2 API is served from registry-1.docker.io.
 func dockerHubAPIHost(host string) string {
 	switch host {
 	case "docker.io", "index.docker.io":
 		return "registry-1.docker.io"
 	default:
 		return host
 	}
 }
>>>>>>> REPLACE
</file_operation>
</file_operations>

<additional_changes>
1. Ensure this file imports the `strings` package:

   ```go
   import (
       // existing imports...
       "strings"
   )
   ```

   If `strings` is already imported, no further change is needed.
</issue_to_address>

### Comment 4
<location> `pkg/distribution/oci/remote/remote.go:363-364` </location>
<code_context>
+	// since they don't serve manifests via /blobs/ at all
+	shouldUseManifestEndpoint := isHF && (desc.MediaType == "application/json" || strings.Contains(desc.MediaType, "+json"))
+
+	// For non-manifest content on non-HF registries, use the underlying fetcher
+	if !shouldUseManifestEndpoint {
 		return f.underlying.Fetch(ctx, desc)
 	}
</code_context>

<issue_to_address>
**nitpick:** The comment describing the conditional no longer matches the implemented logic.

The current wording implies this only applies to non-manifest content on non-HF registries, but `!shouldUseManifestEndpoint` also covers:

- All requests to non-HF registries (including manifest content), and
- HF registries where the media type is not JSON-like.

Please update either the comment or the condition so they describe the same behavior for when `/manifests/` vs `/blobs/` is used.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ilopezluna ilopezluna merged commit b7631f5 into rm-gocontainers-registry Jan 8, 2026
5 checks passed
@ilopezluna ilopezluna deleted the rm-gocontainers-registry-fix branch January 8, 2026 18:31
ilopezluna added a commit that referenced this pull request Jan 8, 2026
* Replace go-containerregistry with containerd/moby

This commit removes the vendored go-containerregistry package and replaces
it with containerd and moby packages for OCI registry operations.

Signed-off-by: Eric Curtin <eric.curtin@docker.com>

* fix for removing ggcr (#545)

* Add Docker Hub integration tests and hostname remapping for API requests

* Refactor Docker Model Runner setup to use a configuration struct for environment variables and log messages

* Update go.mod dependencies to include latest versions of cloud.google.com/go/compute/metadata and golang.org/x/oauth2

* Add support for HuggingFace registry in manifest fetching

* Refactor manifest endpoint logic to apply it only on HF repos

* Remove Docker Hub hostname remapping and update manifest fetching logic for HuggingFace registry

* Refactor manifest endpoint check for HuggingFace to use dedicated media type function

---------

Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Co-authored-by: Ignasi <ignasi.lopez.luna@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants