-
Notifications
You must be signed in to change notification settings - Fork 963
Fix file extension for module PBMARKDUP #9784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refines the PBMARKDUP module’s handling of input file extensions so that outputs follow the tool’s convention when given multiple input types, including compressed FASTA/FASTQ, and updates the associated tests and snapshots accordingly.
Changes:
- Update
PBMARKDUPprocess logic to derive the output suffix from the actual input filenames, including correctly preserving compound extensions such as.fastq.gzand.fasta.gz. - Adjust nf-test specifications for
pbmarkdupto cover multiple input patterns (single FASTA, multiple BAMs, and mixed FASTA/FASTQ inputs with dupfile+log options). - Regenerate the nf-test snapshot file with new test names, expected outputs (including dupfile and log channels), and newer
nf-testand Nextflow version metadata.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
modules/nf-core/pbmarkdup/main.nf |
Changes suffix calculation to search the input list for FASTA/FASTQ filenames (including .gz), derive the appropriate extension fragment, and fall back to the first input’s extension, ensuring the module’s single-output filename matches the primary input type. |
modules/nf-core/pbmarkdup/tests/main.nf.test |
Renames and restructures tests to exercise multiple input types and the dupfile/log behaviour, including a new mixed FASTA/FASTQ test, while keeping stub and single-input FASTA coverage. |
modules/nf-core/pbmarkdup/tests/main.nf.test.snap |
Updates snapshot keys and expected outputs to align with the new test names and behaviours (including null.dup.bam dupfile outputs and test.pbmarkdup.log logs), and records the newer nf-test/Nextflow versions and timestamps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pbmarkdup_args = "--clobber --dup-file ${prefix}.dup.bam --log-level INFO" | ||
| } |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here pbmarkdup_args uses ${prefix} inside a Groovy double-quoted string, but prefix is not defined in this nf-test parameter scope, so it interpolates to the literal string null (reflected by the null.dup.bam filenames in the snapshot). To avoid this surprising behaviour and better reflect the intention of tying the duplicate file name to the module’s output prefix, consider either using an explicit fixed dup file name here or passing the desired prefix via task.ext.prefix instead of relying on ${prefix} in the params string.
| // To allow multiple input types/files: (compressed) fasta, fastq, bam; Determine suffix from input file names | ||
| suffix = | ||
| input.find { | ||
| it.name ==~ /.*\.(fasta|fa|fna)(\.gz)?$/ }?.with { f -> | ||
| f.name.tokenize('.').takeRight(f.name.endsWith('.gz') ? 2 : 1).join('.') | ||
| } ?: | ||
| input.find { it.name ==~ /.*\.(fastq|fq)(\.gz)?$/ }?.with { f -> | ||
| f.name.tokenize('.').takeRight(f.name.endsWith('.gz') ? 2 : 1).join('.') | ||
| } ?: | ||
| input[0].extension |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new suffix resolution logic adds explicit handling for compressed FASTA/FASTQ inputs (e.g. .fasta.gz, .fastq.gz), but the updated pbmarkdup tests only cover uncompressed fasta/fastq and BAM inputs, so the .gz branch isn’t exercised. Given this module already has nf-test coverage, it would be worthwhile to add at least one test using a .fasta.gz or .fastq.gz input to verify that the output file name preserves the full compressed extension as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with a test with fastq.gz. However, it seem output fastq.gz from pbmarkdup is actually not a gz file, just a normal fastq. I have included gzip step for .gz input, otherwise, md5sum for this mal-formed fastq.gz is unstable @SPPearce
|
|
||
| params { | ||
| pbmarkdup_args = "--clobber --rmdup" | ||
| pbmarkdup_args = "--clobber --dup-file ${prefix}.dup.bam --log-level INFO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a closure to be able to use the prefix I think.
If it doesn't work, then just hardcode it, it'd normally be set with a closure.
| pbmarkdup_args = "--clobber --dup-file ${prefix}.dup.bam --log-level INFO" | |
| pbmarkdup_args = { "--clobber --dup-file ${meta.id}.dup.bam --log-level INFO" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I can not used both of it, possibly because args is now stated in main.nf.test then included in nextflow.config (it is compile before, when meta is still a null object ?!).
Thus I have just simply put a string as a name for dup file. Users can use meta.id or ext.prefix if they specify it right in config block.
SPPearce
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AI comments are sensible too.
| { | ||
| "id": "test" | ||
| }, | ||
| "/nfs/treeoflife-01/teams/tolit/users/hh14/nf-core_modules/.nf-test/tests/7b69d200fff49e3e41e85e8d0eb38b36/work/c9/b442fc7d5028eeb2cf319b0363140b/test.dup.fastq.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an nf-test error, it happens when you touch a .gz file. It needs to be echo | gzip > test.dup.fastq.gz instead to prevent this from happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix file extension to:
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda