Releases: PSAL-POSTECH/PyTorchSim
Releases · PSAL-POSTECH/PyTorchSim
PyTorchSim v1.1.0 released
Changelog — v1.1.0
TOGSim (simulator)
- Memory backend: updated to Ramulator 2.1.
- Config format: Configuration files have migrated from JSON to YAML format.
- Stats & robustness: Clearer DRAM bandwidth reporting, safer idle-stat handling, fixes for local/remote memory stats.
- Scheduling: Internal graph API cleanup (non-breaking, no user-facing API changes).Trace files support comments; improved CLI help.
Compiler & runtime (PyTorchSim / MLIR)
- PyTorch version: 2.1 → 2.8 (#196)
- Operators: SDPA can now be routed to a dedicated NPU kernel via
torch.nn.attention.sdpa_kernel([SDPBackend.FLASH_ATTENTION])context manager; TopK, Bitonic sort, Cat added. (#198) - CNNs: MobileNet CI and 1×1 spatial conv as linear; baseline group convolution decomposition + tests. (#205)
- Dtypes / codegen: Fixed float16 codegen in MLIR templates; worked around gem5
lmul8widening issue by avoiding the problematic vector-width in codegen. - TOGSim session: Run kernels under
with TOGSimulator(config_path=...):so config and simulator lifecycle are scoped to the block. - Multi-tenant launch: Call
torch.npu.launch_model(opt_fn, *args, stream_index=..., timestamp=..., **kwargs)inside that block. - Cleanup: Removed legacy scheduler code; standardized on the TOGSimulator-oriented API.
Device (OpenReg / NPU)
- Device API: Use
torch.device("npu")(andtorch.device("npu:0"), etc.) like any built-in device type — no extra package import beyondimport torch; the NPU backend registers with PyTorch's device system. - Eager mode: CPU fallback is applied automatically when graph compilation is not available.
⚠️ Breaking Changes
- Config format migration: Configuration files must be converted from JSON to YAML format. Existing .json config files are no longer supported.
- Multi-tenant API redesign: The scheduler-based multi-tenant launch pattern has been replaced. The old API required manual
Schedulerinstantiation,Requestobject construction, and awhile not scheduler.is_finished():loop. The new API uses awith TOGSimulator(config_path=...):context andtorch.npu.launch_model(..., stream_index=..., timestamp=...)calls directly. Seetest_scheduler.pyfor the updated usage pattern.
CI, tests, experiments
- Added or tightened tests for DeepSeek, YOLOv5, MobileNet; CI image updated for PyTorch 2.8.
Other
- Misc. codegen, indexing, and matmul-related bugfixes and small refactors.
New Contributors
- @MinkyuPark0816 made their first contribution in #206
- @Jagggged made their first contribution in #208
Full Changelog: v1.0.1...v1.1.0