Skip to content

chore: re-label process_low to fastqc#9814

Merged
SPPearce merged 2 commits intonf-core:masterfrom
riverxdata:chore/fastqc
Feb 3, 2026
Merged

chore: re-label process_low to fastqc#9814
SPPearce merged 2 commits intonf-core:masterfrom
riverxdata:chore/fastqc

Conversation

@nttg8100
Copy link
Contributor

@nttg8100 nttg8100 commented Feb 1, 2026

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Description:
After benchmarking FastQC with different thread settings, it appears that FastQC does not follow the modern parallelization model where increasing the number of threads for a single dataset reduces runtime. Instead, its threading model is closer to a GNU parallel wrapper: users are expected to set the number of threads to match the number of FastQC input files, allowing multiple files to be processed in parallel.

In practice, however, FastQC rarely processes more than two FASTQ files at the same time in typical pipelines. As a result, for paired-end samples (two FASTQ files) or single-end samples, using a low-resource configuration (e.g. process_low) is sufficient and does not impact overall runtime. FastQC is using in many pipeline for quality control, while using process_low is wasting resource.

Quick benchmark result: Using the same dataset, increasing cpus and memory does not help at all.
image

For details, follow this blog:
https://riverxdata.github.io/river-docs/blog/bioinformatics-computing-resource-optimization-part1

@SPPearce
Copy link
Contributor

SPPearce commented Feb 3, 2026

Can you do the same test for FALCO? As that is a newer reimplementation.

@nttg8100
Copy link
Contributor Author

nttg8100 commented Feb 3, 2026

Can you do the same test for FALCO? As that is a newer reimplementation.

I can do it but I prefer each module in a specific branch. I will create later if I found the similar issue.

@SPPearce
Copy link
Contributor

SPPearce commented Feb 3, 2026

Yes, I wasn't suggesting to do it as part of this PR.

@SPPearce SPPearce added this pull request to the merge queue Feb 3, 2026
Merged via the queue into nf-core:master with commit 3009f27 Feb 3, 2026
59 checks passed
cavenel pushed a commit to cavenel/modules that referenced this pull request Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments