ci(release): prune cloud builder cache before building#968
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The release job builds 7 image variants on a single shared Docker Build Cloud builder. Accumulated cache from previous runs eventually fills the builder's disk, surfacing as "no space left on device" while unpacking the (growing) upstream llama.cpp image snapshots — which is what broke the b9592 release. Add a `docker buildx prune -af` step right after Set up Buildx so each release starts with a clean builder disk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the LLAMA_SERVER_VERSION to b9592 in .versions and the Dockerfile, alongside updating the llama.cpp submodule commit. However, the reviewer correctly identified that the workflow changes to add a docker buildx prune step, which were intended to resolve a ResourceExhausted cache issue as per the PR description, are missing from the changes.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
|
||
| ARG GO_VERSION=1.25 | ||
| ARG LLAMA_SERVER_VERSION=b9501 | ||
| ARG LLAMA_SERVER_VERSION=b9592 |
There was a problem hiding this comment.
The pull request title and description state that a docker buildx prune -af step is being added to the release workflow to resolve the ResourceExhausted cache issue. However, the actual changes in this PR only consist of version bumps in .versions, Dockerfile, and the llama.cpp submodule. The CI/CD workflow file containing the prune step is missing from this pull request. Please include the workflow changes to ensure the cache is pruned before building.
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="Dockerfile" line_range="4-6" />
<code_context>
+ARG LLAMA_SERVER_VERSION=b9592
ARG LLAMA_SERVER_VARIANT=cpu
-ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9501
+ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9592
ARG VERSION=dev
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Reduce the risk of version skew between `LLAMA_SERVER_VERSION` and `LLAMA_UPSTREAM_IMAGE`.
Both variables currently encode `b9592` separately. To prevent future mismatches, derive `LLAMA_UPSTREAM_IMAGE` from `LLAMA_SERVER_VERSION` (e.g., `ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/...:server-vulkan-${LLAMA_SERVER_VERSION}`) or otherwise ensure a single source of truth for this version.
```suggestion
ARG LLAMA_SERVER_VERSION=b9592
ARG LLAMA_SERVER_VARIANT=cpu
ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-${LLAMA_SERVER_VERSION}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ARG LLAMA_SERVER_VERSION=b9592 | ||
| ARG LLAMA_SERVER_VARIANT=cpu | ||
| ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9501 | ||
| ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9592 |
There was a problem hiding this comment.
suggestion (bug_risk): Reduce the risk of version skew between LLAMA_SERVER_VERSION and LLAMA_UPSTREAM_IMAGE.
Both variables currently encode b9592 separately. To prevent future mismatches, derive LLAMA_UPSTREAM_IMAGE from LLAMA_SERVER_VERSION (e.g., ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/...:server-vulkan-${LLAMA_SERVER_VERSION}) or otherwise ensure a single source of truth for this version.
| ARG LLAMA_SERVER_VERSION=b9592 | |
| ARG LLAMA_SERVER_VARIANT=cpu | |
| ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9501 | |
| ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b9592 | |
| ARG LLAMA_SERVER_VERSION=b9592 | |
| ARG LLAMA_SERVER_VARIANT=cpu | |
| ARG LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-${LLAMA_SERVER_VERSION} |
Context
The release for the llama.cpp
b9501 → b9592bump failed in thebuildjob with:The error fires on the shared Docker Build Cloud builder (
driver: cloud, endpointdocker/make-product-smarter) while unpacking an upstream llama.cpp image snapshot.Investigation
The upstream
ghcr.io/ggml-org/llama.cppimages grew notably across this bump (compressedlinux/amd64layers, measured against GHCR):server-vulkan(cpu)server-openvinoserver-musaserver-cuda13server-rocmRoot cause: the release builds 7 variants (cpu/cuda on amd64+arm64) on one shared cloud builder with no cache cleanup. Cache accumulated across runs filled the disk; the ~1 GB of compressed image growth (several GB uncompressed across all variants × platforms) was the final push.
Change
Add a
docker buildx prune -afstep right after Set up Buildx so every release starts with a clean builder disk.🤖 Generated with Claude Code