fix(llamacpp): resolve backend tag via registry API to honor mirrors#974
fix(llamacpp): resolve backend tag via registry API to honor mirrors#974ilopezluna wants to merge 2 commits into
Conversation
Resolve the llama.cpp backend tag to a digest through the containerd registry resolver instead of the Docker Hub HTTP API. This honors configured registry mirrors (corporate Artifactory / Nexus / Harbor) and `docker login` credentials, so customers behind a private mirror with no direct egress to registry-1.docker.io can resolve and pull the backend. Hardening from review: - retry() now short-circuits terminal errors (not-found, unauthorized, canceled/expired context) instead of looping 10x with 1s sleeps, so a missing tag or bad credentials no longer blocks startup ~9s. Applies to both ResolveDigest and PullPlatform. - Extract a shared newResolver() helper; build the resolver once outside the retry loop rather than rebuilding (and re-reading config.json) per attempt. - Log the resolved digest and the host that served it, to diagnose a re-download loop behind a mirror that re-pushes a converted image. - Drop the now-dead *http.Client parameter from the llama.cpp download path and the unreachable empty-digest branch. - Add goleak and tests for mirror resolution and terminal-error fast-fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the llamacpp backend to resolve image tags to digests using the Registry HTTP API v2, which honors registry mirrors and local docker credentials. It also introduces a fast-fail mechanism for terminal errors during retries and adds comprehensive unit tests. The feedback highlights a critical issue where the resolver is still recreated on every retry in PullPlatform, and suggests expanding the terminal error checks to include 403 Forbidden (errdefs.IsDenied) to fail fast on permission errors.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Address review feedback on PR #974: - PullPlatform now builds the resolver once and passes it into fetch(), instead of fetch() rebuilding it (and re-reading ~/.docker/config.json) on every retry attempt. - isTerminal() now fails fast on 401/403. The containerd resolver only maps 404 to errdefs.ErrNotFound; other 4xx statuses surface as remoteerrors.ErrUnexpectedStatus carrying the raw code, so we inspect StatusCode via errors.As (covering both 401 and 403). 429 is left retryable, matching the resolver's internal rate-limit handling. - Add hermetic retry() tests for 401/403 fast-fail and 429 retry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Context
downloadLatestLlamaCppresolved the llama.cpp backend tag to a digest by calling the Docker Hub HTTP API (hub.docker.com/v2/.../tags/<tag>). That endpoint is only reachable from the public internet, so customers behind a corporate registry mirror (Artifactory / Nexus / Harbor) with no direct egress to Docker Hub could not resolve the backend image even though the mirror could serve it.This change resolves the tag through the containerd registry resolver instead. Resolution now goes through the same
registryutil.RegistryHostspath as the image pull, honoring configured mirrors anddocker logincredentials.Changes
dockerhub.ResolveDigest(ctx, ref, mirrors)resolves a tag to a digest using the registry resolver (manifest HEAD/GET only, no blobs).downloadLatestLlamaCppuses it instead of the Hub HTTP API.Hardening from review
retry()now short-circuits non-retryable errors (errdefs.IsNotFound/IsUnauthorized/ canceled / deadline-exceeded) instead of looping 10×1s (~9s). A missing tag or bad credentials no longer stalls the install/startup path. Applies to bothResolveDigestandPullPlatform.newResolver()helper (reused byfetch()); the resolver is built once outside the retry loop rather than rebuilt and re-reading~/.docker/config.jsonon every attempt.*http.Clientparameter from the llama.cpp download path (the sharedInstallinterface boundary is unchanged) and removed the unreachable empty-digest branch.Tests
goleakTestMainto catch goroutine/connection leaks.retry()tests for terminal-error fast-fail vs. transient-error retry.Verification
go build ./...and per-platformGOOS=darwin/windows/linux go build ./...go veton both packagesgo test ./pkg/internal/dockerhub/...(goleak clean) and./pkg/inference/backends/llamacpp/...🤖 Generated with Claude Code