Conversation
| // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions. | ||
| tuple val("${task.process}"), val('bigslice'), val("2.0.2"), topic: versions, emit: versions_bigslice |
There was a problem hiding this comment.
working on it rn sorry
jfy133
left a comment
There was a problem hiding this comment.
@vagkaratzas any thoughts as well?
modules/nf-core/bigslice/main.nf
Outdated
| def export_tsv_cmd = args2 ? """ | ||
| bigslice \\ | ||
| --export-tsv ${prefix}/result/tsv_export \\ | ||
| --program_db_folder ${hmmdb} \\ | ||
| ${prefix} | ||
| """ : '' |
There was a problem hiding this comment.
I do not understand this implementation, what is args2 trying to do here to activate the execution?
Shouldn't you just have a boolean input value channel that, if true, just injects --export-tsv in to the main command? Why the whole extra command?
There was a problem hiding this comment.
Agreed, ext.args can have more than one param - value pair inside. You don't need a new args* value for each param possible. --export-tsv can be passed and checked through ext.args
There was a problem hiding this comment.
--export-tsv is a separate BiG-SLiCE subcommand that must be called as a second bigslice invocation after clustering completes, so it cannot simply be appended to the main command. Two approaches exist: using ext.args2 to explicitly separate the two commands (clear intent, but adds a non-standard extra args variable), or detecting --export-tsv inside ext.args, stripping it from the main command and running it as a post-step (current approach), which keeps a single ext.args entry point. If you have any other suggestions please tell :')
There was a problem hiding this comment.
OK then yeah I would go for a boolean input channel, and inject the second command (with $args2) if requested.
For variable-based command injection, keep the command on one line rather than across multiple though
There was a problem hiding this comment.
Definitely add a new nf-test case testing this new funcitonality.
I am confused as to which method to choose, because I can see the --export-tsv being used twice; both here --export-tsv ${prefix}/result/tsv_export \\ and in the ${export_tsv_cmd} command afterwards.
I'd definitely skip args2, and parse ext.args for finding it though.
There was a problem hiding this comment.
Given it fundamentally changes the command, I feel input channel is clearer/more explicit
|
no more results folder it will be sample/(bigslice output) fully working parameters with catch for errors if you have any suggestions please tell |
| // Flatten the GBK directory into a list of individual GBK files with meta | ||
| input[0] = UNTAR_GBK.out.untar.map { meta, dir -> | ||
| def gbk_files = [] | ||
| dir.eachFileRecurse { if (it.name.endsWith('.gbk')) gbk_files << it } |
There was a problem hiding this comment.
modules/nf-core/bigslice/main.nf
Outdated
| mv ${prefix}/result/data.db ${prefix}/data.db | ||
| mv ${prefix}/result/tmp ${prefix}/tmp | ||
| rm -rf ${prefix}/result |
There was a problem hiding this comment.
I don't think it's necessary to do these move operations (and actually it's not recommened apparently), youc an just emit prefix as is
famosab
left a comment
There was a problem hiding this comment.
I added a few comments to your PR.
| { assert resultDir.isDirectory() }, | ||
| { assert file("${resultDir}/data.db").exists() }, | ||
| { assert resultDir.list().any { it.endsWith('.fa') || new File("${resultDir}/tmp").exists() } }, | ||
| { assert snapshot( | ||
| process.out.findAll { key, val -> key.startsWith("versions")} | ||
| ).match() } |
There was a problem hiding this comment.
Can you add other files to this snapshot as well? Ideally we want all outputs to be at least present by name in the snapshot.
| tuple val(meta), path(bgc, stageAs: 'bgc_files/s*/*') | ||
| path(hmmdb) | ||
| val(export_tsv) |
There was a problem hiding this comment.
Can we put all these inputs into one tuple? That will make sure you're sure that EVERY time everything comes together in the right combination.
There was a problem hiding this comment.
hmmdb is a shared reference database (not sample-specific) and export_tsv is a boolean flag (not a file), so neither belongs in the sample tuple. In nf-core, tuples group a meta map with the data files of that specific sample mixing in shared resources or behaviour flags would break this convention.
| --program_db_folder ${hmmdb} \\ | ||
| ${prefix} | ||
|
|
||
| ${export_tsv_cmd} |
There was a problem hiding this comment.
Should we add this tsv then as optional output and have it be accessible for downstream analyses?
|
@famosab |
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda