Skip to content

New module: exomiser/analyse#11023

Draft
matthdsm wants to merge 2 commits intomasterfrom
module/exomiser
Draft

New module: exomiser/analyse#11023
matthdsm wants to merge 2 commits intomasterfrom
module/exomiser

Conversation

@matthdsm
Copy link
Copy Markdown
Contributor

POC module PR for exomiser.
I'd like some more eyes on this before I put more time towards it to figure out what's the best way to handle reference data and inputs.

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@matthdsm matthdsm self-assigned this Mar 23, 2026
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor bits and pieces, but the general approach looks reasonable to me. Exomiser seems to require those versions, so they shouldn't go in ext.args, and it makes sense to put them in the tuples as you've done.

Comment on lines +8 to +10
tuple val(meta), path(vcf), path(ped), val(assembly), path(phenopacket), path(analysis_script)
tuple val(meta2), path(reference_cache, stageAs: 'exomiser_data/*'), val(reference_version)
tuple val(meta3), path(phenotype_cache, stageAs: 'exomiser_data/*'), val(phenotype_version)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff looks reasonable to me, I don't really have a better idea

tuple val(meta), path("*.json"), emit: json
tuple val(meta), path("*.html"), emit: html
tuple val(meta), path("*.parquet"), emit: parquet
tuple val(meta), path("*.vcf"), emit: vcf
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we need to bgzip this vcf before we output it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The thing is, I'm only just testing this tool and I haven't had the opportunity to look further into it.
Chances are it's already compressed and I just missed it in the docs

Copy link
Copy Markdown
Contributor

@famosab famosab Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VCF file is tabix-indexed and exomiser ranked alleles can be extracted using grep

Yes its bgzipped (otherwise it cannot be index afaik) so perfect

Comment on lines +9 to +10
tuple val(meta2), path(reference_cache, stageAs: 'exomiser_data/*'), val(reference_version)
tuple val(meta3), path(phenotype_cache, stageAs: 'exomiser_data/*'), val(phenotype_version)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it maybe make sense to have a seperate module that takes care of properly loading this data? That is what we did for PCGR. That would be then exomiser/getreference and it would download and create this needed exomiser.data-directory=/data/exomiser-data which can then be just an input to this module?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the data is too big to be loaded on the fly. It comes down to about 50GB of reference data in total

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats why I would handle it seperately or at least that is how we are doing it with vep cache etc. Either we add the data to the vep cache thingy @maxulysse built or we create a module that can be used in a pipeline to have this loaded see pcgr in the variantprioritization pipeline. And then for testing we subsample this cache to chr22 etc (I did that for pcgr as well).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #9295

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants