feat: Add arc ce slurm CI test#7
Open
aspiringmind-code wants to merge 18 commits into
Open
Conversation
…arc-bootstrap.service dependencies
6 tasks
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add ARC CE + SLURM integration test (GitHub Actions)
Summary
Adds a self-contained Docker image that runs a NorduGrid ARC7 Compute Element wired to a single-node SLURM batch system, plus a GitHub Actions workflow that builds the image, boots it, and runs a full submit → monitor → retrieve integration test against it using the real ARC client tools (
arcsub,arcstat,arcget).What's included
docker/Dockerfile— AlmaLinux 9 image withmunge, SLURM (slurmctld/slurmd), ARC7 (nordugrid-arc7-arex/-client/-arcctl), anddbus-broker, runningsystemdas PID 1 so the packages' own unit files andarcctlwork as designed.docker/slurm.conf,docker/cgroup.conf,docker/arc.conf— single-node SLURM cluster and ARC CE config (LRMS backend = SLURM, REST interface on :443).docker/bootstrap.sh+docker/arc-bootstrap.service— one-shot startup: waits for munge/SLURM, mints an ARC Test-CA host cert bound to the container's runtime hostname, starts A-REX, and issues a test client certificate.docker/healthcheck.sh— DockerHEALTHCHECKgating readiness on the bootstrap sequence actually completing.test/job.xrsl,test/run.sh,test/run_integration_test.sh— the integration test itself: submits a job, polls untilFinished, retrieves output, asserts its contents..github/workflows/integration-test.yml— builds the image, runs it--privileged(required for the systemd/cgroup setup), waits for health, runs the test, uploads ARC/SLURM logs as artifacts on every run (pass or fail).docker-compose.yml+README.md— local reproduction of the same flow, and a write-up of the design decisions below.Notable fixes baked into this config (found via iterative debugging)
munge.keyisn't auto-generated in this build environment — generated explicitly with correct ownership.NodeNamemust match the container's actual runtime hostname orslurmdfails immediately with "Unable to determine this slurmd's NodeName".arc.confneeded corrections against ARC7's actual schema: nox509_user_key/x509_user_certin[common],allowaccessonly valid in[arex/ws/jobs](not[arex/ws]), and[infosys]/[infosys/glue2]are mandatory blocks without which A-REX's info provider fails on every cycle.slurmdneeds a working D-Bus connection for its cgroup "extern step" scope (SLURM 21+, independent ofProctrackType/CgroupAutomount) — addeddbus-broker.slurm_use_sacct = yessilently stalls job-completion detection with noslurmdbdconfigured — switched tosqueue/scontrol-based scanning.CI platform note
Originally built against GitLab CI, but the available GitLab runner is a locked-down Kubernetes executor (Kyverno registry allowlist + no privileged pods), which this setup fundamentally needs (systemd-as-PID1,
--privilegedfor cgroup/dbus access). Moved to GitHub Actions, where hosted runners are plain VMs with Docker preinstalled and unrestricted--privilegedsupport — no runner/cluster admin changes required..gitlab-ci.ymlhas been removed accordingly.Testing
Verified locally via
docker compose up --buildand in GitHub Actions (.github/workflows/integration-test.yml) — full pipeline builds, boots, and passes submit/monitor/retrieve end to end.