Skip to content

Anarcii#11058

Open
Clara0611 wants to merge 25 commits intonf-core:masterfrom
Clara0611:anarcii
Open

Anarcii#11058
Clara0611 wants to merge 25 commits intonf-core:masterfrom
Clara0611:anarcii

Conversation

@Clara0611
Copy link
Copy Markdown

@Clara0611 Clara0611 commented Mar 26, 2026

nf-core/modules pull request

PR checklist

Closes #10805

Adds a new module: ANARCII for auto-numbering TCRs and antibodies using a language model
Tool github: https://github.com/oxpig/ANARCII

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda

@Clara0611 Clara0611 marked this pull request as draft March 26, 2026 14:52
@Clara0611
Copy link
Copy Markdown
Author

Tests were changed to only assert versions and the run-independent portion of the header line, as the computed scores differ slightly between architectures (observed here from the fourth decimal onward). This was noted by the developers of anarcii as well: "[...] However, we have observed that they show minor variation on different architectures and versions of python/torch in a small number of sequences. These differences are minimal." (source: https://github.com/oxpig/ANARCII/wiki/FAQs#understanding-sequence-scores, retrieved 30.03.26).

@Clara0611 Clara0611 marked this pull request as ready for review March 30, 2026 09:45
assertAll(
{ assert snapshot(
process.out.versions,
file(process.out.anarcii.get(0).get(1)).readLines()[0].contains("Name,Chain,Score,Query start,Query end")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can make this assertion easier to read using nft-csv and checking the header? https://github.com/lukfor/nft-csv#columnnames


conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' ?
'oras://community.wave.seqera.io/library/python_pip_anarcii:702a76f5b5d01657' :
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use an https instead of an oras url here: nf-co.re/docs/tutorials/nf-core_components/using_seqera_containers

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, both done :)

@famosab
Copy link
Copy Markdown
Contributor

famosab commented Mar 31, 2026

Did you already manage to add the test-data to the test-datasets repository? :)

Comment on lines +27 to +32
def anarciiResults = path(process.out.anarcii.get(0).get(1)).csv
assert "Name" in anarciiResults.columnNames
assert "Chain" in anarciiResults.columnNames
assert "Score" in anarciiResults.columnNames
assert "Query start" in anarciiResults.columnNames
assert "Query end" in anarciiResults.columnNames
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def anarciiResults = path(process.out.anarcii.get(0).get(1)).csv
assert "Name" in anarciiResults.columnNames
assert "Chain" in anarciiResults.columnNames
assert "Score" in anarciiResults.columnNames
assert "Query start" in anarciiResults.columnNames
assert "Query end" in anarciiResults.columnNames
assert path(process.out.anarcii[0][1]).csv.columnNames == ["Name", "Chain", "Score", "Query start", "Query end"]

or are the optionally more column names?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 128+ other columns corresponding to the numbering, which depends on the dataset. I can find out which exact numbering (includes alternatives sometimes, like 112A, 112B etc) this test data set produces, but I thought that including all of those would be hard to read and not representative of the tool, only of this specific test dataset. However, if you feel I should include them, it is possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still struggling with instable md5sums even after updating the test data?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately while the scores look better, they are still not 100% stable across systems

Comment on lines +27 to +32
def anarciiResults = path(process.out.anarcii.get(0).get(1)).csv
assert "Name" in anarciiResults.columnNames
assert "Chain" in anarciiResults.columnNames
assert "Score" in anarciiResults.columnNames
assert "Query start" in anarciiResults.columnNames
assert "Query end" in anarciiResults.columnNames
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still struggling with instable md5sums even after updating the test data?

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Clara0611 and others added 4 commits April 1, 2026 14:32
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new module: ANARCII

4 participants