SVCFit

SVCFit is a fast and scalable computational tool designed to estimate the Structural Variant Cellular Fraction (SVCF) of inversions, deletions, tandem duplications, and translocations. Developed for the R environment, SVCFit integrates structural variant (SV) calls with Copy Number Variation (CNV) and Single Nucleotide Polymorphism (SNP) data to provide accurate cellular fraction estimates.

Resources

Open access data: It is available on mendeley (doi: 10.17632/2nhhdjx225.3)
Protected Data: Available via European Genome-phenome Archive (EGAD00001001343).
Prostate mixture scripts: GitHub Repository

Installation

SVCFit is hosted on GitHub. You can install it directly within R using the remotes package.

Note: Installation requires a GitHub Personal Access Token (PAT) because the repository is hosted on GitHub.\

if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")

# 1. Setup GitHub Credentials (if not already configured)
if (!requireNamespace("usethis", quietly = TRUE))
    install.packages("usethis")

# Create a token in your browser
usethis::create_github_token() 

# Store the token (paste when prompted)
credentials::set_github_pat()

# 2. Install SVCFit
remotes::install_github("KarchinLab/SVCFit", build_vignettes = TRUE, dependencies = TRUE)

Input Requirements

SVCFit accepts Variant Call Format (VCF) files. By default, it is optimized for VCFs produced by the SVTyper package [2].

If using a different caller (e.g., Manta), ensure your VCF aligns with the following specification:

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	normal	tumor
chr1	1000	INV:6:0:1:0:0:0	T		100	PASS	END=1500;SVTYPE=INV;SVLEN=500;…	GT:PR:SR:…	0/1:76,0:70,0:…	0/1:76,0:70,0:…
chr2	5000	DEL:7:0:1:0:0:0	G		100	PASS	END=5300;SVTYPE=DEL;SVLEN=300;…	GT:PR:SR:…	0/1:76,0:70,0:…	0/1:76,0:70,0:…

Required INFO fields:

SVTYPE (e.g., INV, DEL, DUP, BND)
END
SVLEN
supporting read counts (e.g., PR, SR)

Usage Workflow

The SVCFit pipeline consists of three main steps: Extraction, Characterization, and Calculation.

1. Extract SV and SNP Information — `extract_info()`

Load and preprocess input VCF and CNV files. This step parses metadata, processes breakends (BND), and handles heterozygous SNPs. extract_info() internally performs:

Load input data — load_data()
Process BND events — proc_bnd()
Parse SV metadata — parse_sv_info()
Parse heterozygous SNPs — parse_het_snp(), parse_snp_on_sv()

info <- extract_info(
  p_het = "path/to/het_snps.vcf",
  p_onsv = "path/to/snps_on_sv.vcf",
  p_sv = "path/to/structural_variants.vcf",
  p_cnv = "path/to/cnv_file.txt",
  chr_lst = NULL
  flank_del = 50, 
  QUAL_tresh = 100, 
  min_alt = 2, 
  tumor_only = FALSE
)

Function Arguments

`extract_info()`

Argument	Type	Default	Description
`p_het`	Character	—	Path to VCF of heterozygous SNPs.
`p_onsv`	Character	—	Path to VCF of SNPs overlapping SV-supporting reads.
`p_sv`	Character	—	Path to SV VCF.
`p_cnv`	Character	—	Path to CNV file.
`chr_lst`	Character	NULL	Chromosomes to include.
`flank_del`	numeric	50	Max distance to consider deletion overlapping a BND.
`QUAL_tresh`	numeric	100	Minimum QUAL score.
`min_alt`	numeric	2	Minimum alternative reads.
`tumor_only`	Logical	FALSE	Whether SVs come from tumor-only calling.

Output: A list of data frames containing parsed SV + SNP information.

2. Annotate SVs Using CNV and SNP Information — `characterize_sv()`

This step integrates CNV and heterozygous SNPs to infer phasing, zygosity, and overlapping CNV. characterize_sv() internally performs:

Assign SV IDs to SNPs — assign_svids()
Summarizes phasing + zygosity — sum_sv_info()
Assign CNV to SV — assign_cnv()
Annotate overlapping CNV — annotate_cnv(), parse_snp_on_sv()

sv_char <- characterize_sv(
  sv_phase = info$sv_phase, 
  sv_info = info$sv_info, 
  cnv = info$cnv,
  flank_snp = 500,
  flank_cnv = 1000
)

Function Arguments

`characterize_sv()`

Argument	Type	Default	Description
`sv_phase`	data.frame	—	Phasing/zygosity from SNPs.
`sv_info`	data.frame	—	Parsed SV metadata.
`cnv`	data.frame	—	CNV data.
`flank_snp`	numeric	500	Max assignment distance for SNPs.
`flank_cnv`	numeric	1000	Max assignment distance for CNVs.

3. Calculate SVCF for Structural Variants — `calculate_svcf()`

This step computes the Structural Variant Cellular Fraction (SVCF).

svcf_out <- calculate_svcf(
  anno_sv_cnv = sv_char$anno_sv_cnv,
  sv_info     = sv_char$sv_info,
  thresh      = 0.1,
  samp        = "SampleID",
  exper       = "ExperimentID"
)

Function Arguments

Argument	Type	Default	Description
`anno_sv_cnv`	data.frame	—	CNV-annotated SVs.
`sv_info`	data.frame	—	Parsed SV info.
`thresh`	numeric	0.1	Threshold for SV-before-CNV inference.
`samp`	character	—	Sample name.
`exper`	character	—	Experiment name.

The output is an annotated VCF with additional fields for VAF, Rbar, r and SVCF. VAF=variant allele frequency; Rbar=average break interval count in a sample; r = inferred integer copy number of break intervals; SVCF=structural variant cellular fraction.

4. Build tumor evolution tree — `build_tree()`

This step build the tumor evolutionary tree based on SV clusters obtained from Dirichlet process Gaussian Mixture Model (DP-GMM).

output <- cluster_data(
  pair_path, 
  pur_path, 
  pair=1)
clone2=output[[3]]

build_tree(
  clones,
  lineage_precedence_thresh=0.2, 
  sum_filter_thresh=0.2)

Function Arguments

cluster_data()

Argument	Type	Default	Description
`pair_path`	character	—	path to file with paired sample ID for each patient.
`pur_path`	character	-	path to file with purity for each sample.
`pair`	numeric	1	The identifier for the specific sample pair (patient) being analyzed.

build_tree()

Argument	Type	Default	Description
`clones`	data.frame	—	SV clustering result.
`lineage_precedence_thresh`	numeric	0.2	Maximum violation of lineage precedence rule.
`sum_filter_thresh`	numeric	0.2	Maximum violation of sum condition rule

The output is tumor evolutionary tree rooted at germline (G) and the node number corresponds to the SV cluster number. The branching of the nodes depicts the chronic occurence of clusters of SVs.

5. Simulation & Benchmarking

SVCFit includes utility functions for processing simulation data from VISOR and attaching “ground truth” labels to structural variants for benchmarking.

5.1 read clonal assignment

truth <- load_truth(
  truth_path = "path/to/truth_beds", 
  overlap = FALSE
  )

This function has 1 arguments:

Argument	Type	Default	Description
`truth_path`	Character	N/A	Path to BED files storing true structural variant information with clonal assignment. Each BED file should be named like `"c1.bed, c2.bed"`, etc for non-overlapping simulations and `"c11.bed, c22.bed"`, etc for overlapping simulations. Structural variants should be saved in separate BED files if they belong to different (sub)clones.
`overlap`	Logical	FALSE	Whether the simulation has SV-CNV overlap.

The file path should follow this structure:

root/
├── true_clone/
│   ├── c1.bed/
│   ├── c2.bed/
│   ├── c3.bed/
│   └── .../

Parent nodes should always have lower number in name than its children (i.e. c1.bed instead of c3.bed) and all child node bed file should conatin its ancestors mutations.

5.2 attach clonal assignment to output

svcf_truth <- attach_truth(svcf_out, truth)

This function has 3 arguments:

Variable	Type	Default	Description
`svcf_out`	DataFrame	N/A	The output from `calc_svcf`
`truth`	DataFrame	N/A	Stores the clone assignment for each structural variant designed in a simulation.

This appends the known clonal assignment to the calculated SVCF output for performance evaluation.

Tutorial

library(SVCFit)
vignette("SVCFit_guide", package = "SVCFit")

Reference

Cmero, Marek, Yuan, Ke, Ong, Cheng Soon, Schröder, Jan, Corcoran, Niall M., Papenfuss, Tony, et al., “Inferring Structural Variant Cancer Cell Fraction,” Nature Communications, 11(1) (2020), 730.
Chen, X. et al. (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32, 1220-1222. doi:10.1093/bioinformatics/btv710

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
Paper		Paper
R		R
inst		inst
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
SVCFit.Rproj		SVCFit.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVCFit

Installation

Input Requirements

Usage Workflow

1. Extract SV and SNP Information — `extract_info()`

Function Arguments

`extract_info()`

2. Annotate SVs Using CNV and SNP Information — `characterize_sv()`

Function Arguments

`characterize_sv()`

3. Calculate SVCF for Structural Variants — `calculate_svcf()`

Function Arguments

4. Build tumor evolution tree — `build_tree()`

Function Arguments

5. Simulation & Benchmarking

Tutorial

Reference

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

KarchinLab/SVCFit

Folders and files

Latest commit

History

Repository files navigation

SVCFit

Installation

Input Requirements

Usage Workflow

1. Extract SV and SNP Information — extract_info()

Function Arguments

extract_info()

2. Annotate SVs Using CNV and SNP Information — characterize_sv()

Function Arguments

characterize_sv()

3. Calculate SVCF for Structural Variants — calculate_svcf()

Function Arguments

4. Build tumor evolution tree — build_tree()

Function Arguments

5. Simulation & Benchmarking

Tutorial

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

1. Extract SV and SNP Information — `extract_info()`

`extract_info()`

2. Annotate SVs Using CNV and SNP Information — `characterize_sv()`

`characterize_sv()`

3. Calculate SVCF for Structural Variants — `calculate_svcf()`

4. Build tumor evolution tree — `build_tree()`

Packages