Add caalm by vagkaratzas · Pull Request #11087 · nf-core/modules

vagkaratzas · 2026-03-30T10:43:11Z

New protein annotation software for CAZyme prediciton from amino acid sequences.
PR includes the setup module needed to download models from hugging face.

The bioconda recipe review process is slow, so for now, using a pip installation (env and containers through Seqera Containers).

This is the CPU version. Will probably create a separate GPU (dramatic speed increase) one after this gets merged.

PR checklist

vagkaratzas · 2026-03-30T11:01:24Z

test_level*_embeddings, seem to change between my local machine and GitHub runners; investigating

vagkaratzas · 2026-03-30T11:20:35Z

test_level*_embeddings, seem to change between my local machine and GitHub runners; investigating

I did some more tests and they seem to stay the same in a system, but change depending on CPU model, probably because of non-determinism in the FAISS index search or the ESM embedding computation, and not due to a container/environment difference.

Likely causes (by the bot -I excluded one that didn't make sense for this case):

1. CPU architecture / SIMD instructions — FAISS uses AVX2/AVX-512 on modern CPUs and falls back to SSE4  
  or scalar on older ones. The floating-point operations are reordered differently depending on the SIMD   
  path, producing subtly different embeddings due to floating-point non-associativity. GitHub runners use  
  different CPU generations than your local machine. 

2. FAISS approximate nearest-neighbour — if Level 2 uses an IVF or HNSW index, the search is approximate 
  and the results can differ when the hardware SIMD path changes, even with the same query vectors.        
   
 The fact that it's stable within the same machine across Singularity/Docker/Conda confirms it's not a    
  software version or library issue — the same binary code hits the same SIMD path on the same CPU. But
  cross-platform (your machine → GitHub runner), the CPU capabilities differ, changing the low-level       
  floating-point execution path.

modules/nf-core/caalm/caalm/tests/main.nf.test

modules/nf-core/caalm/caalm/main.nf

jfy133 · 2026-03-31T15:04:49Z

modules/nf-core/caalm/downloadmodels/main.nf

+    path("models/level0"), emit: level0
+    path("models/level1"), emit: level1
+    path("models/level2"), emit: level2


I would put these all on one tuple, they are all related and can't be used in any other way (e.g. with .bam and .bai) - that way you don't have to do any .combine shenanigans in this case

Also agreed. Coming up

vagkaratzas added 8 commits March 30, 2026 09:28

caalm init - seqera containers and env from pip

233f127

main and meta init

38b925d

nf-test init

878afda

switch to CAALM_CAALM and expect a second input of downloaded model

498f086

switch to test-datasets file

133c344

caalm_downloadmodels module

8bb8aa9

nf-tests pass

54b884f

meta update and warning prevention during downloads

1abe8ba

github-actions bot added the size/l label Mar 30, 2026

Merge branch 'master' into add-caalm

e029e72

export log, update tests to pass across platforms

39baa3e

jfy133 reviewed Mar 31, 2026

View reviewed changes

modules/nf-core/caalm/caalm/tests/main.nf.test Show resolved Hide resolved

modules/nf-core/caalm/caalm/main.nf Outdated Show resolved Hide resolved

modules/nf-core/caalm/caalm/main.nf Show resolved Hide resolved

vagkaratzas and others added 2 commits March 31, 2026 11:32

Merge branch 'master' into add-caalm

df79189

task.cpus added

42f4f04

vagkaratzas requested a review from jfy133 March 31, 2026 10:44

vagkaratzas and others added 3 commits March 31, 2026 12:27

model levels split

74ffac4

extra versions reported; python, torch, faiss

8516688

Merge branch 'master' into add-caalm

b692b24

jfy133 reviewed Mar 31, 2026

View reviewed changes

vagkaratzas and others added 2 commits March 31, 2026 16:19

downloadmodels output in tuple

66009e8

Merge branch 'master' into add-caalm

7710278

vagkaratzas requested a review from jfy133 March 31, 2026 15:19

jfy133 approved these changes Mar 31, 2026

View reviewed changes

vagkaratzas added this pull request to the merge queue Mar 31, 2026

Merged via the queue into master with commit 2153095 Mar 31, 2026
29 checks passed

vagkaratzas deleted the add-caalm branch March 31, 2026 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caalm#11087

Add caalm#11087
vagkaratzas merged 17 commits intomasterfrom
add-caalm

vagkaratzas commented Mar 30, 2026 •

edited

Loading

Uh oh!

vagkaratzas commented Mar 30, 2026

Uh oh!

vagkaratzas commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jfy133 Mar 31, 2026

Uh oh!

vagkaratzas Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vagkaratzas commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

vagkaratzas commented Mar 30, 2026

Uh oh!

vagkaratzas commented Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jfy133 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

vagkaratzas Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vagkaratzas commented Mar 30, 2026 •

edited

Loading