fix(build): apt bins to app stage, per-binary COPYs, no more lost runtime deps (closes #149)#150
Merged
Merged
Conversation
… fix checksums path (closes #147) Five interlocking bugs in the `forge build` output that compound to a bloated runtime image and a silently-disabled integrity check: 1. **Prompt-builder body duplication.** A SKILL.md with N `## Tool:` sections parses into N SkillEntry values that all share the same Body. The compiler's prompt builder appended the body once per entry, so the bundled code-review-github skill (4 tools) shipped 4× repeats and aibuilderdemo's compiled/prompt.txt was 1199 lines for what should be ~250. Dedup via a seen-bodies set; per-skill JSON entries still carry the full Body for SDK consumers. 2. **compiled/prompt.txt and compiled/skills/skills.json are dead.** The runtime never opens either file — runner.discoverSkillFiles() re-globs skills/SKILL.md on every startup. Stopped generating them; deleted forge-cli/skills/writer.go; pruned the matching COPY lines from the Dockerfile template and the validate stage's required-files check. External library consumers (forgecore.Compile) still receive the in-memory CompiledSkills struct directly. 3. **Dockerfile `COPY . .` leaked operator-side artifacts into /app.** Expanded .dockerignore to exclude k8s/, build-manifest.json, the entire compiled/ directory, the Dockerfile itself, the .dockerignore itself, and .local-bins/. The runtime image now contains only forge.yaml, guardrails.json, <channel>-config.yaml, agent.json, policy-scaffold.json, skills/, and checksums.json. 4. **checksums.json integrity check silently skipped in container.** The runner looked at <WorkDir>/.forge-output/checksums.json, which never exists inside a Forge container — the file lands at /app/checksums.json directly. Added a fallback that picks the flattened layout when the operator-side layout is absent. 5. **Removed redundant COPY lines from Dockerfile.tmpl** that would have failed after fix #2 anyway (the targets no longer exist). End-to-end verified on the user's aibuilderdemo agent: - Pre-fix: 21 files in .forge-output, compiled/prompt.txt 1199 lines, image contained k8s/ + build-manifest.json + security-audit.json + recursive Dockerfile + dead compiled/ artifacts. - Post-fix: 19 files (compiled/prompt.txt + compiled/skills/skills.json removed), image contains only the 8 files the runtime opens, operator artifacts intact on disk for forge package / kubectl apply. Tests pin all four invariants: - TestCompile_BodyDeduplication_* (3 tests) - TestSkillsStage_WithSkills updated to assert artifacts NOT created - TestDockerignore_ExcludesOperatorSideArtifacts + TestDockerignore_KeepsRuntimeFiles + TestDockerfile_NoLongerCopiesDeadCompiledArtifacts - TestResolveChecksumsDir_* (3 tests) + TestVerifyBuildOutput_ContainerLayoutWorksEndToEnd
…time deps (closes #149) apt-installed binaries declared by skills were silently lost in the final image. The pre-fix flow: every skill bin (curl, git, jq, gh) went through a single shared `bins` stage. apt placed curl/git/jq at /usr/bin/ (Debian's default), but the application stage only did `COPY --from=bins /usr/local/bin/ /usr/local/bin/` — picking up just the manually-placed `gh` and dropping the apt-installed bins entirely. Any skill script that called curl/git/jq would log "command not found" at runtime; only `gh` worked because its install path explicitly moved it to /usr/local/bin/gh. The blunt wholesale-directory COPY also hid the bug — there was no per-binary contract to assert against. New code, new artifact, easy to miss in review. This change separates the bins-stage and app-stage concerns: - apt/apk binaries → installed in the application stage directly via the existing `RUN apt-get install` slot. The package manager's dependency resolution pulls /usr/lib/, /etc/, etc. along automatically, so per-bin `COPY --from=bins` plumbing (which can't preserve lib deps) is avoided. - direct-URL + custom-RUN binaries → still flow through the shared `bins` stage (they need curl + a writable filesystem to download and unpack). The bins stage's apt install is now strictly scoped to build-time helpers (curl + ca-certificates) that never reach the app image. - image-copy binaries (e.g. playwright) → copied straight from their `bin-<name>` companion stage to the app stage, skipping the bins intermediate hop. - local-file binaries → emitted as `COPY .local-bins/<name>` in the app stage (no `--from`, no bins-stage routing). `packaging.GenerateDockerfile` now returns a structured `DockerfileFragments` with separate slots for `PreAppStages`, `RuntimeAptPackages`, `RuntimeApkPackages`, `BinCopies`, and `PathExtensions`. The template consumes each slot in the right place; the wholesale `COPY --from=bins /usr/local/bin/ /usr/local/bin/` is gone, replaced by an explicit per-binary loop. End-to-end verified on the bundled aibuilderdemo agent (curl/git/jq via apt, gh via custom-RUN). The pre-fix Dockerfile had curl/git/jq trapped in the bins stage and dropped on COPY. Post-fix: FROM debian:bookworm-slim AS bins RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates && rm -rf /var/lib/apt/lists/* RUN curl -fsSL https://github.com/cli/cli/releases/.../gh_2.60.0_linux_amd64.tar.gz | tar xz -C /tmp RUN mv /tmp/.../gh /usr/local/bin/gh RUN chmod 0755 /usr/local/bin/gh # --- Application stage --- FROM debian:bookworm-slim ... WORKDIR /app COPY --from=bins /usr/local/bin/gh /usr/local/bin/gh COPY . . RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates curl git jq && rm -rf /var/lib/apt/lists/* ... ENTRYPOINT ["forge","run","--host","0.0.0.0","--with","slack"] Tests pin the new contract per install method (apt/apk in app stage, direct-URL/custom-RUN via per-binary COPY from bins, image-copy via per-binary COPY from companion stage, local-file from build context), plus a regression test for the bundled code-review skill shape (curl+git+jq+gh) confirming all four reach the final image.
…all methods, how to add a bin Adds a dedicated reference for the bin pipeline so contributors and operators don't have to read forge-core/packaging/ to know how to declare a new tool in a SKILL.md or override one at the project level. Covers: - The four-source priority walker (local-file → skill-local → forge.yaml override → registry → fallback), with the actual precedence implemented in bin_classifier.go. - The six install methods (apt, apk, direct-URL, custom-run, image-copy, local-file) and which Dockerfile slot each routes through after the issue #149 split. - The four ways to add a binary: use a registry entry, declare inline in SKILL.md, override in forge.yaml, or supply a local file via --local-bin. Each path with a copy-pasteable example. - A registry-entry field reference for contributors adding to image-registry.yaml. - A quick decision tree and a sample of the post-#149 generated Dockerfile so readers see what `forge build` actually emits. Cross-links added in: - docs/core-concepts/skill-md-format.md — the bins: bullet now points here for resolution rules. - docs/reference/forge-yaml-schema.md — package.bin_overrides points here for source priority. - docs/reference/cli-reference.md — --local-bin rows in both `forge build` and `forge package` point at the local-file section. - docs/deployment/docker.md — bin pipeline reference under "Building Agent Container Images". - README.md — Core Concepts table gains a row. sync-docs mapping table extended so future changes to forge-skills/registry/image-registry.yaml, forge-core/packaging/, forge-cli/templates/Dockerfile.tmpl, or forge-cli/build/dockerfile_stage.go refresh this doc.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #149. Stacked on top of #148.
Summary
apt-installed binaries declared by skills were silently lost in the runtime image. The bundled `code-review` skill needs `curl`, `git`, `jq`, `gh` — only `gh` survived. Any skill script calling `curl`/`git`/`jq` would log "command not found" at runtime.
Also addressed: the wholesale-directory `COPY --from=bins /usr/local/bin/ /usr/local/bin/` that masked the bug — replaced with explicit per-binary `COPY` lines for every binary that reaches the app stage.
Root cause
The pre-fix flow put every skill bin through a single shared `bins` stage:
```
FROM debian:bookworm-slim AS bins
RUN apt-get install -y curl git jq # ← apt lands these at /usr/bin/
RUN curl ... | tar xz ... && mv .../gh /usr/local/bin/gh
--- Application stage ---
COPY --from=bins /usr/local/bin/ /usr/local/bin/ # ← only catches gh
```
apt drops binaries at `/usr/bin/` per Debian convention, with transitive deps in `/usr/lib/`, `/etc/`. Copying just `/usr/local/bin/` from the bins stage left them behind. And even a per-binary `COPY --from=bins /usr/bin/git` wouldn't work because git depends on `/usr/lib/git-core/` and other support files.
Fix
Restructured `packaging.GenerateDockerfile` to return a structured `DockerfileFragments` separating concerns by install method:
`DockerfileFragments` exposes:
```go
type DockerfileFragments struct {
PreAppStages string // companion image-copy stages + shared bins stage
RuntimeAptPackages []string // app stage installs
RuntimeApkPackages []string // app stage installs (alpine)
BinCopies []string // explicit per-binary COPY lines for app stage
PathExtensions []string
}
```
The bins stage's apt install is now strictly `curl ca-certificates` (build-time helpers only), scoped to the stage and never propagating to the runtime image.
End-to-end repro
aibuilderdemo (declares curl/git/jq via apt, gh via custom-RUN):
Generated app-stage excerpt post-fix:
```
WORKDIR /app
COPY --from=bins /usr/local/bin/gh /usr/local/bin/gh
COPY . .
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates curl git jq && rm -rf /var/lib/apt/lists/*
```
Files changed
Test plan
Out of scope
Stacked on
This PR targets the `fix/issue-147-build-output` branch (PR #148) rather than `main` so the `Dockerfile.tmpl` changes from #148 don't conflict. Will rebase to `main` after #148 merges.