Big slice imp modules by SkyLexS · Pull Request #11034 · nf-core/modules

SkyLexS · 2026-03-24T12:02:08Z

PR checklist

Closes #XXX

camlloyd · 2026-03-24T12:08:52Z

modules/nf-core/bigslice/main.nf

-    // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
-    tuple val("${task.process}"), val('bigslice'), val("2.0.2"), topic: versions, emit: versions_bigslice


Please keep this

working on it rn sorry

jfy133

@vagkaratzas any thoughts as well?

jfy133 · 2026-03-31T07:56:50Z

modules/nf-core/bigslice/main.nf

+    def export_tsv_cmd = args2 ? """
+    bigslice \\
+        --export-tsv ${prefix}/result/tsv_export \\
+        --program_db_folder ${hmmdb} \\
+        ${prefix}
+    """ : ''


I do not understand this implementation, what is args2 trying to do here to activate the execution?

Shouldn't you just have a boolean input value channel that, if true, just injects --export-tsv in to the main command? Why the whole extra command?

Agreed, ext.args can have more than one param - value pair inside. You don't need a new args* value for each param possible. --export-tsv can be passed and checked through ext.args

--export-tsv is a separate BiG-SLiCE subcommand that must be called as a second bigslice invocation after clustering completes, so it cannot simply be appended to the main command. Two approaches exist: using ext.args2 to explicitly separate the two commands (clear intent, but adds a non-standard extra args variable), or detecting --export-tsv inside ext.args, stripping it from the main command and running it as a post-step (current approach), which keeps a single ext.args entry point. If you have any other suggestions please tell :')

OK then yeah I would go for a boolean input channel, and inject the second command (with $args2) if requested.
For variable-based command injection, keep the command on one line rather than across multiple though

Definitely add a new nf-test case testing this new funcitonality.

I am confused as to which method to choose, because I can see the --export-tsv being used twice; both here --export-tsv ${prefix}/result/tsv_export \\ and in the ${export_tsv_cmd} command afterwards.

I'd definitely skip args2, and parse ext.args for finding it though.

Given it fundamentally changes the command, I feel input channel is clearer/more explicit

…faults

SkyLexS · 2026-04-01T11:26:26Z

no more results folder it will be sample/(bigslice output) fully working parameters with catch for errors if you have any suggestions please tell

jfy133 · 2026-04-01T11:37:10Z

modules/nf-core/bigslice/tests/main.nf.test

+                // Flatten the GBK directory into a list of individual GBK files with meta
+                input[0] = UNTAR_GBK.out.untar.map { meta, dir ->
+                    def gbk_files = []
+                    dir.eachFileRecurse { if (it.name.endsWith('.gbk')) gbk_files << it }


This might be suffiicent: https://nf-co.re/docs/contributing/nf-test/assertions#snapshotting-variable-files-in-a-channel-emitting-a-directory

jfy133 · 2026-04-01T11:37:14Z

modules/nf-core/bigslice/main.nf

+    mv ${prefix}/result/data.db ${prefix}/data.db
+    mv ${prefix}/result/tmp    ${prefix}/tmp
+    rm -rf ${prefix}/result


I don't think it's necessary to do these move operations (and actually it's not recommened apparently), youc an just emit prefix as is

famosab

I added a few comments to your PR.

famosab · 2026-04-02T13:12:39Z

modules/nf-core/bigslice/tests/main.nf.test

+                { assert resultDir.isDirectory() },
+                { assert file("${resultDir}/data.db").exists() },
+                { assert resultDir.list().any { it.endsWith('.fa') || new File("${resultDir}/tmp").exists() } },
+                { assert snapshot(
+                    process.out.findAll { key, val -> key.startsWith("versions")}
+                ).match() }


Can you add other files to this snapshot as well? Ideally we want all outputs to be at least present by name in the snapshot.

famosab · 2026-04-02T13:12:57Z

modules/nf-core/bigslice/main.nf

+    tuple val(meta), path(bgc, stageAs: 'bgc_files/s*/*')
+    path(hmmdb)
+    val(export_tsv)


Can we put all these inputs into one tuple? That will make sure you're sure that EVERY time everything comes together in the right combination.

hmmdb is a shared reference database (not sample-specific) and export_tsv is a boolean flag (not a file), so neither belongs in the sample tuple. In nf-core, tuples group a meta map with the data files of that specific sample mixing in shared resources or behaviour flags would break this convention.

famosab · 2026-04-02T13:13:41Z

modules/nf-core/bigslice/main.nf

        --program_db_folder ${hmmdb} \\
        ${prefix}
+
+    ${export_tsv_cmd}


Should we add this tsv then as optional output and have it be accessible for downstream analyses?

SkyLexS · 2026-04-02T22:08:11Z

@famosab
The *.fa files in result/tmp/ are non-deterministic, bigslice generates them based on HMM scoring, and the exact set of hits varies between runs/environments due to floating-point differences. Snapshotting their names will always break on CI. So instead we just assert that at least one .fa exists there, which is enough to confirm the tool ran correctly.

SkyLexS and others added 5 commits March 4, 2026 00:37

Updating bigslice

15d27bd

[automated] Fix linting with Prettier

4edc137

Linting and modifing the bigslice version emiting

2d142cf

Updated the bigslice module

4e0e985

Merge branch 'master' into big_slice_imp_modules

c321854

camlloyd reviewed Mar 24, 2026

View reviewed changes

fixing versioning

bfedf4c

SkyLexS requested a review from camlloyd March 30, 2026 06:35

jfy133 reviewed Mar 31, 2026

View reviewed changes

SkyLexS and others added 2 commits April 1, 2026 13:58

resolved export_tsv support and other parameters and fix parameter de…

f8542ca

…faults

Merge branch 'master' into big_slice_imp_modules

572c806

SkyLexS requested a review from jfy133 April 1, 2026 11:16

jfy133 reviewed Apr 1, 2026

View reviewed changes

SkyLexS and others added 7 commits April 1, 2026 16:11

collect all samples cross-run

4de6d51

Merge branch 'master' into big_slice_imp_modules

ed38681

updating snapshot

cbd886a

Merge branch 'master' into big_slice_imp_modules

f281cf2

updating snapshot yet again

59c156d

updating snapshot

eb1d391

Merge branch 'master' into big_slice_imp_modules

8814ea4

SkyLexS requested a review from jfy133 April 1, 2026 14:35

famosab reviewed Apr 2, 2026

View reviewed changes

add tsv emit, snapshot file names

90fac06

updated tests

a3e6cb4

		// WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
		tuple val("${task.process}"), val('bigslice'), val("2.0.2"), topic: versions, emit: versions_bigslice

Conversation

SkyLexS commented Mar 24, 2026

PR checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfy133 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SkyLexS commented Apr 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

famosab left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SkyLexS commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants