Skip to content

feat: Optimize Docker caching and harden CI cache behavior#295

Merged
BenjaminMichaelis merged 10 commits into
mainfrom
benjaminmichaelis/dockerfile-cache-optimization
May 19, 2026
Merged

feat: Optimize Docker caching and harden CI cache behavior#295
BenjaminMichaelis merged 10 commits into
mainfrom
benjaminmichaelis/dockerfile-cache-optimization

Conversation

@BenjaminMichaelis
Copy link
Copy Markdown
Member

This improves Docker build performance and cache stability in CI by separating stable and volatile Docker layers, persisting BuildKit cache mounts, and preventing non-main runs from polluting mainline cache state.

What changed

  • Reworked Dockerfile layering for cache efficiency:
    • Install tools before source copy.
    • Copy restore/build manifests first, then run restore.
    • Copy full source later so source-only edits do not invalidate expensive restore/tool layers.
  • Added BuildKit cache mounts with explicit IDs for:
    • tdnf (try-tdnf)
    • NuGet (try-nuget)
    • npm (try-npm)
  • Kept publish focused on the deploy target project with a single publish step:
    • dotnet publish --no-restore /App/src/Microsoft.TryDotNet
  • Added .dockerignore entries to reduce context churn and avoid cache-dance directory leakage (.buildkit-cache, scratch, **/obj, **/bin, etc.).

CI cache behavior

  • Added Buildx + reproducible-containers/buildkit-cache-dance to persist RUN --mount=type=cache data across runs.
  • PR path is restore-only for mount caches and read-only for GHA layer caches.
  • Main branch writes cache state.
  • Guarded cache writes so only refs/heads/main can write scope=try-main and save mount cache, avoiding cache pollution from non-main workflow_dispatch runs.

Notes for reviewers

  • The goal is faster incremental rebuilds and fewer broad invalidations, not necessarily smaller total cache footprint.
  • This follows the same cache-hardening pattern used in prior reference implementations, adapted to this repository's workflow structure.

- Reorder build stage: install OS tools first, then copy only manifest
  files (*.csproj, package.json/lock, NuGet.config, Directory.*.props,
  eng/), run dotnet restore, then COPY all source. Routine code edits
  no longer bust the tool-install and restore layers.
- Add --mount=type=cache for /root/.nuget/packages (both restore and
  publish) and /root/.npm (build-js.sh) to speed up local builds.
- Combine separate tdnf install commands into one RUN in both stages,
  avoiding unnecessary layer overhead.
- Add .dockerignore to exclude .git, **/obj, **/bin, **/dist,
  **/node_modules, TestResults, artifacts, and *.tar. The **/obj
  exclusion is important: local dotnet restore writes project.assets.json
  there; without ignoring it those files could overwrite the container's
  restore output and cause silent mismatches.
- Add cache-from: type=gha to the PR Docker build step so pull requests
  can read from the cache written by main-branch builds, avoiding full
  rebuilds on every PR. PRs do not write cache to avoid consuming
  the 10 GB quota with per-branch entries.
Use reproducible-containers/buildkit-cache-dance@v3 to serialize the
BuildKit cache mounts (/root/.nuget/packages and /root/.npm) into
actions/cache between runs. On ephemeral GHA runners these mounts were
previously empty at the start of every build; now they are warm on every
run where dependencies haven't changed.

Changes:
- Add id: setup-buildx to the setup-buildx action step (required for
  the builder name output used by cache-dance).
- Add actions/cache@v4 step with a key that includes hashes of
  Dockerfile, Directory.Packages.props, NuGet.config, and all
  package-lock.json files. restore-keys provides a warm fallback
  when only a subset of those files changes.
- Add reproducible-containers/buildkit-cache-dance@v3 step before
  the Docker builds. skip-extraction is set from the cache-hit output
  so extraction is skipped when the key is an exact hit (actions/cache
  will not overwrite an existing entry with the same key anyway).
- Add .buildkit-cache and scratch to .dockerignore. The cache-dance
  action populates .buildkit-cache with serialized NuGet and npm
  package data (can exceed 1 GB) and uses scratch as a temp dir;
  both must be excluded from the Docker build context.
- Add '# syntax=docker/dockerfile:1' header to Dockerfile to explicitly
  pin the BuildKit frontend (required recommendation when using
  --mount=type=cache).
- Add --mount=type=cache,target=/var/cache/tdnf,sharing=locked to both
  tdnf install RUN instructions. The cache mount keeps downloaded RPMs
  out of the image layer without needing tdnf clean all; remove those
  clean calls so the persistent cache is not wiped after each install.
- Add permissions: actions: write to the build-and-test job.
  docker/build-push-action requires this permission to write to the
  GitHub Actions cache (type=gha). Without it cache writes fail silently.
- Add scope=try-main to GHA Docker layer cache (cache-from/cache-to) so
  PR builds can't evict main branch cache entries
- Switch to actions/cache/restore@v5 (restore-only) + conditional
  actions/cache/save@v5 (main only) so PRs never write buildkit-cache
  entries to the shared GHA cache
- Set skip-extraction based on event type: false on push (always refresh
  cache after build), true on pull_request (no save, no need to extract)
- Add explicit cache mount IDs (try-tdnf, try-nuget, try-npm) to all
  --mount=type=cache directives in Dockerfile to prevent collisions on
  shared runners
Caches the compiled output as a distinct layer. If publish fails
(e.g. publish config issue), the build layer is already cached so
the next run skips the full recompile.
Avoids compiling test projects, SimulatorGenerator, and WasmRunner
that are not needed in the runtime image. dotnet build resolves
project references automatically so all required dependencies still build.
- Split non-PR container build into main/non-main paths
- Keep cache-to scope=try-main only on refs/heads/main
- Restrict actions/cache/save for buildkit mounts to main branch
- Keep non-main dispatch builds read-only against main cache
Revert split build/publish layering and keep a single publish command
for the deployment target project:
- dotnet publish --no-restore /App/src/Microsoft.TryDotNet

This keeps the Dockerfile simpler while still avoiding a second restore.
Copilot AI review requested due to automatic review settings May 19, 2026 19:55
@BenjaminMichaelis BenjaminMichaelis temporarily deployed to BuildAndUploadImage May 19, 2026 19:55 — with GitHub Actions Inactive
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Docker build performance and CI cache stability by restructuring the Dockerfile to maximize layer reuse, introducing persistent BuildKit cache mounts (NuGet/npm/tdnf), and hardening GitHub Actions caching so only main updates shared cache state.

Changes:

  • Re-layered the Dockerfile to restore dependencies before copying full source, and added BuildKit --mount=type=cache for tdnf, NuGet, and npm.
  • Added mount-cache persistence in CI via reproducible-containers/buildkit-cache-dance + actions/cache, and scoped GHA layer caching to try-main.
  • Added a .dockerignore to reduce build context churn (e.g., bin/, obj/, node_modules/, .buildkit-cache).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
Dockerfile Splits stable vs. volatile layers and adds BuildKit cache mounts for faster rebuilds.
.github/workflows/Build-Test-And-Deploy.yaml Adds Buildx + mount cache persistence, and restricts cache writes to refs/heads/main.
.dockerignore Reduces Docker context size/churn to improve cache hit rates and build speed.
Comments suppressed due to low confidence (1)

.github/workflows/Build-Test-And-Deploy.yaml:77

  • The workflow isn't configured to run on merge_group events (the on: section only lists push, pull_request, and workflow_dispatch), so the || github.event_name == 'merge_group' part of this condition is currently unreachable. Either add merge_group: to the workflow triggers or drop the extra condition to avoid confusion.
      - name: Docker build (no push)
        if: github.event_name == 'pull_request' || github.event_name == 'merge_group'
        uses: docker/build-push-action@v7

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/Build-Test-And-Deploy.yaml
@BenjaminMichaelis BenjaminMichaelis temporarily deployed to BuildAndUploadImage May 19, 2026 20:09 — with GitHub Actions Inactive
- Keep PR validation path as build-only/no-push
- Keep main branch path for image build, cache writes, and artifact upload
- Remove non-main non-PR image build path
- Gate deploy jobs to main branch to match artifact-producing path
@BenjaminMichaelis BenjaminMichaelis temporarily deployed to BuildAndUploadImage May 19, 2026 20:59 — with GitHub Actions Inactive
- Use actions/cache@v5 on main (restore + post-job save)
- Keep PR path restore-only with actions/cache/restore@v5
- Remove explicit actions/cache/save step
- Keep extraction disabled on PRs and on main cache-hit runs
@BenjaminMichaelis BenjaminMichaelis temporarily deployed to BuildAndUploadImage May 19, 2026 21:06 — with GitHub Actions Inactive
@BenjaminMichaelis BenjaminMichaelis changed the title Optimize Docker caching and harden CI cache behavior feat: Optimize Docker caching and harden CI cache behavior May 19, 2026
@BenjaminMichaelis BenjaminMichaelis merged commit 9f41e60 into main May 19, 2026
11 checks passed
@BenjaminMichaelis BenjaminMichaelis deleted the benjaminmichaelis/dockerfile-cache-optimization branch May 19, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants