cellgeni/nf-cluster is a Nextflow pipeline for single-cell ATAC processing from Cell Ranger ARC output directories to integrated clustering and visualization outputs.
The current ATAC workflow performs:
- AMULET doublet calling
- metadata attachment and QC filtering
- tile matrix generation and feature selection
- Scrublet doublet scoring
- on-disk concatenation across samples
- spectral embedding
- RAPIDS neighbors, Leiden clustering, and UMAP
- Scanpy embedding plots colored by Leiden and selected metadata columns
Prepare a sample sheet with the following columns:
sample,path
SAMPLE_A,/path/to/cellranger_arc_count_output_A
SAMPLE_B,/path/to/cellranger_arc_count_output_BEach path should point to a Cell Ranger ARC output directory containing fragments files (for example, fragments.tsv.gz).
Run the ATAC workflow:
nextflow run cellgeni/nf-cluster \
--input examples/samples.csv \
--atac.genome hg38 \
--outdir resultsYou can also provide parameters through a YAML/JSON params file:
nextflow run cellgeni/nf-cluster \
-params-file params.yml- Required:
--input: CSV sample sheet with sample,path--atac.genome: genome label used by the ATAC workflow
- Common:
--random_state--outdir
- RAPIDS neighbors:
--neighbors.n_neighbors--neighbors.algorithm--neighbors.metric--neighbors.method
- RAPIDS Leiden:
--leiden.resolution--leiden.theta--leiden.n_iterations--leiden.key_added
- RAPIDS UMAP:
--umap.min_dist--umap.spread--umap.n_components--umap.init_pos
- Embedding plotting:
--embeddingplot.basis(default: X_umap)--embeddingplot.color(list, default includes leiden)--embeddingplot.legend_loc
The pipeline writes linked outputs under the chosen outdir, including:
- h5ad/filtered: QC-filtered AnnData files
- h5ad/neighbors: AnnData with neighbor graph
- h5ad/leiden: AnnData with Leiden labels in obs
- h5ad/umap: AnnData with UMAP coordinates in obsm
- plots/embedding: PNG embedding plots (for example, UMAP colored by leiden)
- amulet: AMULET outputs
- reports: Nextflow execution report, timeline, trace, and DAG