Skip to content

ci(gfx11): build one universal multi-arch ROCm package#24

Merged
jimw567 merged 3 commits into
gfx11from
jimwu.multiarch-rocm
Jun 17, 2026
Merged

ci(gfx11): build one universal multi-arch ROCm package#24
jimw567 merged 3 commits into
gfx11from
jimwu.multiarch-rocm

Conversation

@jimw567

@jimw567 jimw567 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Collapse the 4-leg per-family build matrix (gfx1151/gfx1150/gfx1153/gfx110X) into a single universal build sourced from TheRock's multi-arch ROCm tarball. One fat binary covers all current CI arches (gfx1100/1101/1102/1103/1150/1151/1153) and ships as one universal release archive instead of four mostly-duplicate per-family archives.
  • The ~11.5 GB multi-arch tarball is streamed and pruned at the tar level: all .kpack device-code files are dropped, along with the per-arch Tensile databases of every non-target arch (datacenter + unused consumer). The 11.5 GB is streamed, never stored, keeping the runner disk footprint small.
  • Tensile-only packaging: the GEMM path llama.cpp uses works from the per-arch rocBLAS/hipBLASLt Tensile DB alone — validated on gfx1151 hardware (rocBLAS sgemm succeeds with ROCM_KPACK_DISABLE=1). No .kpack files are bundled.
  • test-gfx de-matrixed to a single gfx1151 hardware run on the universal artifact (real llama-cli inference) — the end-to-end safety net. create-release emits one llama-<tag>-ubuntu-rocm-universal-x64.tar.gz.

Test plan

  • build-ubuntu completes without disk exhaustion; extracted /opt/rocm/lib/rocblas/library contains only the 7 target arch dirs and no .kpack dir exists.
  • test-gfx on linux-gfx1151-gpu-rocm passes all checks (device selected, layers offloaded, correct deterministic answer to "2+2").
  • Manual dispatch with create_release=true publishes a single *-universal-x64.tar.gz asset that extracts to a runnable tree.

Collapse the 4-leg per-family matrix (gfx1151/gfx1150/gfx1153/gfx110X)
into a single build sourced from TheRock's multi-arch tarball. One fat
binary covers all current CI arches (gfx1100-1103, gfx1150/1151/1153)
and ships as one universal release archive instead of four mostly-
duplicate per-family archives.

The multi-arch tarball is streamed and pruned at the tar level: drop all
.kpack and the Tensile DBs of every non-target arch. The GEMM path
llama.cpp uses works from the per-arch Tensile DB alone (validated on
gfx1151 hardware: rocBLAS sgemm succeeds with ROCM_KPACK_DISABLE=1), so
no .kpack files are bundled. The gfx1151 hardware test job is the
end-to-end safety net.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Comment thread .github/workflows/build-gfx11-rocm.yml Outdated
Jim Wu and others added 2 commits June 17, 2026 08:32
The "universal" vs "multi-arch" wording was redundant — both describe one
package covering many arches. Standardize on "multiarch" to match TheRock's
upstream vocabulary. Renames the artifact/archive to
llama-<TAG>-ubuntu-rocm-multiarch-x64 and updates comments + release body.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Follow-up to the package rename: drop the hyphenated "multi-arch" prose in
comments, echoes, and the release body for one consistent spelling. The
upstream nightlies endpoint (tarball-multi-arch) and the therock-dist-linux-
multiarch- filenames are external names and left untouched.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
@jimw567 jimw567 requested a review from eble-amd June 17, 2026 14:51
Comment thread .github/workflows/build-gfx11-rocm.yml
@jimw567 jimw567 requested a review from Annieren June 17, 2026 22:20

@Annieren Annieren left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jimw567 jimw567 merged commit b5e799d into gfx11 Jun 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants