Skip to content

feat: add input validation for early error detection#446

Open
haoyu-haoyu wants to merge 1 commit intoRosettaCommons:mainfrom
haoyu-haoyu:feat/add-input-validation
Open

feat: add input validation for early error detection#446
haoyu-haoyu wants to merge 1 commit intoRosettaCommons:mainfrom
haoyu-haoyu:feat/add-input-validation

Conversation

@haoyu-haoyu
Copy link

Summary

  • Add rfdiffusion/validation.py with validators for PDB files, contig strings, checkpoint paths, hotspot residues, and diffuser config
  • Integrate validation into Sampler.initialize() and sample_init(), before GPU allocation and model loading

Currently, invalid inputs (missing PDB, malformed contigs, bad hotspot format) produce cryptic errors deep in tensor operations after the model is already loaded on GPU. This PR catches these errors early with clear, actionable messages.

Validators

Function What it checks
validate_pdb_path() File exists, contains ATOM records with valid coordinates
validate_contig_string() Syntax: ranges 10-20, chain specs A5-50, no negatives, start ≤ end
validate_checkpoint_path() Model checkpoint file exists
validate_hotspot_res() Format A50 (chain letter + integer)
validate_diffuser_config() T ≥ 1, partial_T ≤ T

Example error messages

ValidationError: Input PDB file not found: /path/to/missing.pdb
ValidationError: Invalid contig range: '20-10' (start > end)
ValidationError: Model checkpoint not found: models/Base_ckpt.pt. Please download models following the README instructions.

Test plan

  • Missing PDB → clear ValidationError before model loading
  • Empty PDB (no ATOM records) → clear error
  • Malformed contig strings → clear error
  • All valid existing inputs still work unchanged

Add rfdiffusion/validation.py with validators for:
- PDB file existence and ATOM record format
- Contig string syntax (ranges, chain-residue specs)
- Model checkpoint existence
- Hotspot residue format (chain letter + number)
- Diffuser config parameters (T, partial_T bounds)

Validators are called in Sampler.initialize() and sample_init(), before
GPU allocation and model loading, so users get clear error messages
instead of cryptic tensor shape mismatches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant