This guide explains how to manually create InputOutputInfo and ParallelizabilityInfo generators for a new command in PaSh.
PaSh uses a dictionary to map shell command names to Python class names. To register a new command:
-
Open
AnnotationGeneration.py(or the relevant file whereDICT_CMD_NAME_TO_REPRESENTATION_IN_MODULE_NAMESis defined). -
Add an entry for the new command:
DICT_CMD_NAME_TO_REPRESENTATION_IN_MODULE_NAMES = { ... "<command-name>": "<ClassRepresentation>", # Add your new command here ... }
Each command requires an InputOutputInfo and ParallelizabilityInfo generator, which determines how it handles input and output files.
Navigate to:
pash_annotations/annotation_generation/annotation_generators/Create a two files named:
InputOutputInfoGenerator<ClassRepresentation>.py
ParallelizabilityInfo<ClassRepresentation>.pyIf your command is "cat-wrapper", the file should be:
InputOutputInfoGeneratorCatWrapper.py
ParallelizabilityInfoCatWrapper.pyInside the newly created files, define a class that inherits from the appropriate interface:
- For Input/Output Behavior: Inherit from
InputOutputInfoGeneratorInterface - For Parallelization Behavior: Inherit from
ParallelizabilityInfoGeneratorInterface
In the InputOutputInfo generator, specify how your command processes input and produces output. This includes:
- How the command reads input.
- Whether it writes to stdout or modifies files in place.
- How each flag affects input and output behavior.
For example:
- Commands like
catread from stdin or files and write to stdout. - Commands like
mvmodify files in place without stdout output. - Commands like
greptake both input files and options that affect behavior.
In the ParallelizabilityInfo generator, define what parallelization strategies can be applied while maintaining correct execution. Consider:
- Whether the command can process input in independent chunks (e.g.,
sortcan, butgrepwith-Aor-Bcannot). - Whether it can be executed in parallel on separate input files.
- Whether it requires ordering constraints to maintain correctness.
For example:
sortcan process chunks independently, then merge results.wccan process chunks independently, and would then sum up the results.catwith no flags is stateless, so the default options work.
By implementing these details, you ensure efficient parallel execution while preserving the functional correctness of your command.