Skip to content

Add snpclustering subworkflow#11059

Open
dbaku42 wants to merge 2 commits intonf-core:masterfrom
dbaku42:add/snpclustering
Open

Add snpclustering subworkflow#11059
dbaku42 wants to merge 2 commits intonf-core:masterfrom
dbaku42:add/snpclustering

Conversation

@dbaku42
Copy link
Copy Markdown

@dbaku42 dbaku42 commented Mar 26, 2026

Description

This PR adds the snpclustering subworkflow for end-to-end unsupervised clustering of genomic samples directly from multi-sample VCF files.

Features

  • Variant filtering (MAF + missingness) with bcftools/filter
  • LD pruning with plink2/indeppairwise
  • Export pruned VCF with plink2/recodevcf
  • PCA with flashpca2

The subworkflow was developed in relation to the accepted nf-core proposal for the consepopgen pipeline.

Related to:

Checklist

  • nf-core subworkflows lint snpclustering passed
  • nf-core subworkflows test snpclustering passed
  • Follows nf-core subworkflow conventions

Closes # (no specific issue)

@famosab
Copy link
Copy Markdown
Contributor

famosab commented Apr 2, 2026

Please join the nf-core organization on GitHub to enable the CI-tests to run on your PR. You can request to join the organization via #github-invitations in the nf-core slack. You can join the nf-core slack via https://nf-co.re/join.

Copy link
Copy Markdown
Contributor

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

}

then {
assert workflow.success
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want a snapshot here (look at other subworkflows)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore

missing

main:
versions = Channel.empty()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for each module if they still export the versions I think at least bcftools/filter does not anymore

- vcf:
type: file
description: "Multi-sample VCF file (bgzipped and indexed)"
pattern: "*.{vcf,vcf.gz}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: "*.{vcf,vcf.gz}"
pattern: "*.vcf.gz"

FLASHPCA2 ( PLINK2_RECODE_VCF.out.vcf )
versions = versions.mix(FLASHPCA2.out.versions.first())

// TODO: qui aggiungeremo KMeans/DBSCAN/plot quando creeremo i moduli local
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still something to add?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants