Skip to content

Compatibility layer for EESSI version 2026.06#235

Open
bedroge wants to merge 95 commits into
EESSI:mainfrom
bedroge:2026.x
Open

Compatibility layer for EESSI version 2026.06#235
bedroge wants to merge 95 commits into
EESSI:mainfrom
bedroge:2026.x

Conversation

@bedroge

@bedroge bedroge commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Still WIP, but testing some new functionality already...

@bedroge

bedroge commented May 11, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-compat arch:x86_64/generic

@bedroge

bedroge commented May 11, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws

eessi-bot-aws Bot commented May 11, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156325

date job status comment
May 11 19:48:57 UTC 2026 submitted job id 156325 awaits release by job manager
May 11 19:49:15 UTC 2026 released job awaits launch by Slurm scheduler
May 11 19:55:17 UTC 2026 running job 156325 is running
May 11 19:57:20 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156325.out
❌ some task failed
❌ no tarball found
Artefacts
Details
May 11 19:57:20 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-156325.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented May 11, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws

eessi-bot-aws Bot commented May 11, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156326

date job status comment
May 11 21:20:19 UTC 2026 submitted job id 156326 awaits release by job manager
May 11 21:20:30 UTC 2026 released job awaits launch by Slurm scheduler
May 11 21:25:33 UTC 2026 running job 156326 is running
May 12 00:33:38 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156326.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778545846.tar.gzsize: 1316 MiB (1380624192 bytes)
entries: 183549
May 12 00:33:38 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-156326.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented May 12, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws

eessi-bot-aws Bot commented May 12, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156327

date job status comment
May 12 07:28:32 UTC 2026 submitted job id 156327 awaits release by job manager
May 12 07:29:10 UTC 2026 released job awaits launch by Slurm scheduler
May 12 07:35:13 UTC 2026 running job 156327 is running
May 12 07:56:42 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156327.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778572503.tar.gzsize: 389 MiB (408295890 bytes)
entries: 173293
May 12 07:56:42 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-156327.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented May 12, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws

eessi-bot-aws Bot commented May 12, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156328

date job status comment
May 12 08:17:53 UTC 2026 submitted job id 156328 awaits release by job manager
May 12 08:18:48 UTC 2026 released job awaits launch by Slurm scheduler
May 12 08:19:51 UTC 2026 running job 156328 is running
May 12 14:40:14 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156328.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778596658.tar.gzsize: 1327 MiB (1391465593 bytes)
entries: 179430
May 12 14:40:14 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-156328.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge

bedroge commented May 12, 2026

Copy link
Copy Markdown
Collaborator Author

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws

eessi-bot-aws Bot commented May 12, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156329

date job status comment
May 12 15:00:39 UTC 2026 submitted job id 156329 awaits release by job manager
May 12 15:01:20 UTC 2026 released job awaits launch by Slurm scheduler
May 12 15:02:23 UTC 2026 running job 156329 is running
May 12 19:02:42 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156329.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778612402.tar.gzsize: 1442 MiB (1512120372 bytes)
entries: 196326
May 12 19:02:42 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-156329.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@Neves-P

Neves-P commented Jun 30, 2026

Copy link
Copy Markdown
Member

We decided today to ingest the compat layer as #231 is necessary in the near future but not a strict blocker for 2026.06.

@casparvl casparvl added the bot:deploy Ask bot to deploy built tarballs for compat layer label Jun 30, 2026
@casparvl

Copy link
Copy Markdown
Contributor

bot:status last_build

@eessi-bot-aws

eessi-bot-aws Bot commented Jun 30, 2026

Copy link
Copy Markdown

This is the status of all the bot: build commands:

on for repo result date status url
generic aarch64/generic eessi.io-2025.06-compat 😁 SUCCESS Jun 19 01:49:05 UTC 2026 finished #235 (comment)
generic riscv64/generic eessi.io-2025.06-compat 😁 SUCCESS Jun 20 16:33:24 UTC 2026 finished #235 (comment)
generic x86_64/generic eessi.io-2025.06-compat 😁 SUCCESS Jun 19 01:22:10 UTC 2026 finished #235 (comment)

@casparvl

Copy link
Copy Markdown
Contributor

I really don't understand what I'm seeing in the staging PR...

Deployment contents
eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst 🔐
eessi-2025.06-software-linux-riscv64-generic-layer-1780038350.tar.zst 🔐
eessi-2026.06-compat-linux-riscv64-1781972080.tar.gz 🔐

I mean, I see only one uploaded tarball in this PR. #235 (comment) and #235 (comment) don't seem to have been uploaded for some reason.

Yet in the staging PR I see three tarballs, with names ranging from eessi-2023.06 to eessi-2026.06.

What I also don't understand is that our build command is:

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-riscv for:arch=riscv64/generic

I.e. the target repo is eessi.io-2025.06-compat? Why 2025.06? How does that result in a tarball with 2026.06 in the name? I'm utterly confused...

@casparvl

Copy link
Copy Markdown
Contributor

It seems the AWS cluster uploaded that strange 2023.06 tarball:

upload: ../../project/def-users/SHARED/jobs/2026.05/pr_235/160217/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst to s3://software.eessi.io-2023.06/2023.06/software/linux/x86_64/amd/zen2/17796967460.tar.zst/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst
Creating metadata file
create_metadata_file file=/project/def-users/SHARED/jobs/2026.05/pr_235/160217/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst                                    url=https://software.eessi.io-2023.06.s3.amazonaws.com/2023.06/software/linux/x86_64/amd/zen2/17796967460.tar.zst/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst                                    github_repository=E
ESSI/compatibility-layer                                    pull_request_number=235                                    pr_comment_id=4532626274
metadata:
{
  "uploader": {
    "username": "bot",
    "ip": "3.123.10.249",
    "hostname": "login1.int.aws-rocky88-202310.eessi.io"
  },
  "payload": {
    "filename": "eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst",
    "size": "25394",
    "ctime": "Mon May 25 08:12:27 UTC 2026",
    "sha256sum": "b99b862cf7d9a0b6893431935ac24c38735f09c364f72cc7f3f121bc5dfcd98d",
    "url": "https://software.eessi.io-2023.06.s3.amazonaws.com/2023.06/software/linux/x86_64/amd/zen2/17796967460.tar.zst/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst"
  },
  "link2pr": {
    "repo": "EESSI/compatibility-layer",
    "pr": "235",
    "pr_comment_id": "4532626274"
  }
}

And at the end of it, it seems to have crashed (when uploading the metadata file?):

upload: ../../tmp/tmp.NYRpnWVZIe to s3://software.eessi.io-2023.06/2023.06/software/linux/x86_64/amd/zen2/17796967460.tar.zst/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst.meta.txt
'
           stderr 'INFO:    Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred
INFO:    Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO:    Using cached SIF image
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Signing file /project/def-users/SHARED/jobs/2026.05/pr_235/160217/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst
Write signature to /project/def-users/SHARED/jobs/2026.05/pr_235/160217/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst.sig
Signing file /tmp/tmp.NYRpnWVZIe
Write signature to /tmp/tmp.NYRpnWVZIe.sig
'
           exit code 0
[20260630-T15:14:51] WARNING: A crash occurred!
Traceback (most recent call last):
  File "/home/bot/.local/lib/python3.6/site-packages/pyghee/lib.py", line 170, in process_event
    self.handle_event(event_info, log_file=log_file)
  File "/home/bot/.local/lib/python3.6/site-packages/pyghee/lib.py", line 102, in handle_event
    handler(event_info, log_file=log_file)
  File "/home/bot/eessi-bot-software-layer/eessi_bot_event_handler.py", line 465, in handle_pull_request_event
    handler(event_info, pr)
  File "/home/bot/eessi-bot-software-layer/eessi_bot_event_handler.py", line 397, in handle_pull_request_labeled_event
    deploy_built_artefacts(pr, event_info)
  File "/home/bot/eessi-bot-software-layer/tasks/deploy.py", line 689, in deploy_built_artefacts
    upload_artefact(job_dir, artefact, repo_name, pr.number, pr_comment_id)
  File "/home/bot/eessi-bot-software-layer/tasks/deploy.py", line 443, in upload_artefact
    "succeeded")
  File "/home/bot/eessi-bot-software-layer/tasks/deploy.py", line 210, in update_pr_comment
    issue_comment = pr_comments.determine_issue_comment(pull_request, pr_comment_id, artefact)
  File "/home/bot/eessi-bot-software-layer/tools/pr_comments.py", line 94, in determine_issue_comment
    return pull_request.get_issue_comment(pr_comment_id)
  File "/home/bot/.local/lib/python3.6/site-packages/github/PullRequest.py", line 683, in get_issue_comment
    "GET", f"{self._parentUrl(self.issue_url)}/comments/{id}"
  File "/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 355, in requestJsonAndCheck
    verb, url, parameters, headers, input, self.__customConnection(url)
  File "/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/issues/comments#get-an-issue-comment", "status": "404"}

This may be why it didn't proceed to the others (?)

@casparvl

Copy link
Copy Markdown
Contributor

Ok, let me first just try to redeploy...

@casparvl casparvl added bot:deploy Ask bot to deploy built tarballs for compat layer and removed bot:deploy Ask bot to deploy built tarballs for compat layer labels Jun 30, 2026
@casparvl

Copy link
Copy Markdown
Contributor

Crash seems to be consistent. I'm thinking: it may be related to the fact that the bot is trying to add to an existing PR comment that got deleted - and that might mess up the entire state. I mean:

upload: ../../tmp/tmp.Oa2jTsZCni to s3://software.eessi.io-2023.06/2023.06/software/linux/x86_64/amd/zen2/17796967460.tar.zst/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst.meta.txt
'

I mean upload ../../tmp/tmp.<something>, that doesn't sound like a reasonable path for something to upload, right? I guess the state of the bot is just completely messed up.

@casparvl

Copy link
Copy Markdown
Contributor

Renamed the two items it would normally upload, let's see if this prevents it from uploading these...

[bot@login1 160217]$ mv eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst.bak
[bot@login1 160217]$ mv eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst.sig eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst.sig.bak

@casparvl casparvl added bot:deploy Ask bot to deploy built tarballs for compat layer and removed bot:deploy Ask bot to deploy built tarballs for compat layer labels Jun 30, 2026
@casparvl

Copy link
Copy Markdown
Contributor

Even worse:

[20260630-T16:22:08] run_cmd(): Error running '/usr/bin/apptainer exec --contain --no-home --bind /project/60006/SHARED/jobs/2026.05/pr_235/event_74b7f550-5810-11f1-8479-3cc4bc861bd0/run_000/x86_64/amd/zen2
/eessi.io-2023.06-software --bind /project/def-users/SHARED/jobs/2026.05/pr_235 --bind /project/def-users/SHARED/jobs/2026.05/pr_235/160217 --bind /project/60006/SHARED/jobs/2026.05/pr_235 --bind /home/bot/eessi-bot-software-layer/scripts --bind /home/bot docker://ghcr.io/eessi/build-node:debian12 /home/bot/eessi-bot-software-layer/scripts/eessi-upload-to-staging --bucket-name software.eessi.io-2023.06 --pr-c
omment-id 4532626274 --pull-request-number 235 --repository EESSI/compatibility-layer --bot-instance eessi-bot-mc-aws --sign-key /home/bot/eessi-bot-aws.2026-05-12.private-key.pem --sign-script /home/bot/ee
ssi-bot-software-layer/scripts/sign_verify_file_ssh.sh /project/def-users/SHARED/jobs/2026.05/pr_235/160217/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst' in 'None
           stdout ''/project/def-users/SHARED/jobs/2026.05/pr_235/160217/eessi-2023.06-software-linux-x86_64-amd-zen2-17796967460.tar.zst' is not a readable non zero-sized file.
'
           stderr 'INFO:    Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred
INFO:    Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO:    Using cached SIF image
'
           exit code 1
[20260630-T16:22:09] WARNING: A crash occurred!
Traceback (most recent call last):
  File "/home/bot/.local/lib/python3.6/site-packages/pyghee/lib.py", line 170, in process_event
    self.handle_event(event_info, log_file=log_file)
  File "/home/bot/.local/lib/python3.6/site-packages/pyghee/lib.py", line 102, in handle_event
    handler(event_info, log_file=log_file)
  File "/home/bot/eessi-bot-software-layer/eessi_bot_event_handler.py", line 465, in handle_pull_request_event
    handler(event_info, pr)
  File "/home/bot/eessi-bot-software-layer/eessi_bot_event_handler.py", line 397, in handle_pull_request_labeled_event
    deploy_built_artefacts(pr, event_info)
  File "/home/bot/eessi-bot-software-layer/tasks/deploy.py", line 689, in deploy_built_artefacts
    upload_artefact(job_dir, artefact, repo_name, pr.number, pr_comment_id)
  File "/home/bot/eessi-bot-software-layer/tasks/deploy.py", line 447, in upload_artefact
    "failed")
  File "/home/bot/eessi-bot-software-layer/tasks/deploy.py", line 210, in update_pr_comment
    issue_comment = pr_comments.determine_issue_comment(pull_request, pr_comment_id, artefact)
  File "/home/bot/eessi-bot-software-layer/tools/pr_comments.py", line 94, in determine_issue_comment
    return pull_request.get_issue_comment(pr_comment_id)
  File "/home/bot/.local/lib/python3.6/site-packages/github/PullRequest.py", line 683, in get_issue_comment
    "GET", f"{self._parentUrl(self.issue_url)}/comments/{id}"
  File "/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 355, in requestJsonAndCheck
    verb, url, parameters, headers, input, self.__customConnection(url)
  File "/home/bot/.local/lib/python3.6/site-packages/github/Requester.py", line 378, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/issues/comments#get-an-issue-comment", "status": "404"}

I guess it doesn't actually check the directory, just reconstruct the name based on the metadata and just assume the file exists. Ok, next option: see if I can make the bot believe this was a failed build so that it will skip it...

@casparvl casparvl added bot:deploy Ask bot to deploy built tarballs for compat layer and removed bot:deploy Ask bot to deploy built tarballs for compat layer labels Jun 30, 2026
@casparvl

Copy link
Copy Markdown
Contributor

Oof, there's another one as well:

upload: ../../project/def-users/SHARED/jobs/2026.05/pr_235/160218/eessi-2023.06-software-linux-x86_64-amd-zen3-17796969570.tar.zst to s3://software.eessi.io-2023.06/2023.06/software/linux/x86_64/amd/zen3/17
796969570.tar.zst/eessi-2023.06-software-linux-x86_64-amd-zen3-17796969570.tar.zst

leads to the same crash. Will try the same workaround.

@casparvl

Copy link
Copy Markdown
Contributor

Btw, the workaround I do is:

[bot@login1 160218]$ cp _bot_job160218.result _bot_job160218.result.bak

and then modify the _bot_job160218.result so that the status is FAILED:

[bot@login1 160218]$ cat _bot_job160218.result | grep status
status = FAILED

@casparvl casparvl added bot:deploy Ask bot to deploy built tarballs for compat layer and removed bot:deploy Ask bot to deploy built tarballs for compat layer labels Jun 30, 2026
@casparvl

Copy link
Copy Markdown
Contributor

Ugh, there are more... It sees 17 successful jobs

[20260630-T16:28:56] determine_successful_jobs(): SUCCESSFUL job in '/project/def-users/SHARED/jobs/2026.06/pr_235/168053'
[20260630-T16:28:56] determine_artefacts_to_deploy(): num successful jobs 17

a number of which seem to be eessi-2023.06 tarballs on a range of CPU architectures. I guess we need to apply the workaround to all.

@casparvl

Copy link
Copy Markdown
Contributor

Ok, did the same for jobs 160219 to 160228... let's see how it goes now.

@casparvl casparvl added bot:deploy Ask bot to deploy built tarballs for compat layer and removed bot:deploy Ask bot to deploy built tarballs for compat layer labels Jun 30, 2026
@casparvl

Copy link
Copy Markdown
Contributor

Now we're getting somewhere. #235 (comment) just uploaded, the other one is on the way. I'll have to go for dinner. But I fully expect these to show up in the staging PR on the next pass by the Stratum 0.

@boegel

boegel commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@casparvl Since staging PR is merged, we should merge this as is?

@boegel boegel changed the title [WIP] Compatibility layer for EESSI version 2026.x Compatibility layer for EESSI version 2026.06 Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:deploy Ask bot to deploy built tarballs for compat layer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants