Skip to content

Releases: PSAL-POSTECH/PyTorchSim

PyTorchSim v1.1.0 released

25 Apr 10:32
f595cef

Choose a tag to compare

Changelog — v1.1.0

TOGSim (simulator)

  • Memory backend: updated to Ramulator 2.1.
  • Config format: Configuration files have migrated from JSON to YAML format.
  • Stats & robustness: Clearer DRAM bandwidth reporting, safer idle-stat handling, fixes for local/remote memory stats.
  • Scheduling: Internal graph API cleanup (non-breaking, no user-facing API changes).Trace files support comments; improved CLI help.

Compiler & runtime (PyTorchSim / MLIR)

  • PyTorch version: 2.1 → 2.8 (#196)
  • Operators: SDPA can now be routed to a dedicated NPU kernel via torch.nn.attention.sdpa_kernel([SDPBackend.FLASH_ATTENTION]) context manager; TopK, Bitonic sort, Cat added. (#198)
  • CNNs: MobileNet CI and 1×1 spatial conv as linear; baseline group convolution decomposition + tests. (#205)
  • Dtypes / codegen: Fixed float16 codegen in MLIR templates; worked around gem5 lmul8 widening issue by avoiding the problematic vector-width in codegen.
  • TOGSim session: Run kernels under with TOGSimulator(config_path=...): so config and simulator lifecycle are scoped to the block.
  • Multi-tenant launch: Call torch.npu.launch_model(opt_fn, *args, stream_index=..., timestamp=..., **kwargs) inside that block.
  • Cleanup: Removed legacy scheduler code; standardized on the TOGSimulator-oriented API.

Device (OpenReg / NPU)

  • Device API: Use torch.device("npu") (and torch.device("npu:0"), etc.) like any built-in device type — no extra package import beyond import torch; the NPU backend registers with PyTorch's device system.
  • Eager mode: CPU fallback is applied automatically when graph compilation is not available.

⚠️ Breaking Changes

  • Config format migration: Configuration files must be converted from JSON to YAML format. Existing .json config files are no longer supported.
  • Multi-tenant API redesign: The scheduler-based multi-tenant launch pattern has been replaced. The old API required manual Scheduler instantiation, Request object construction, and a while not scheduler.is_finished(): loop. The new API uses a with TOGSimulator(config_path=...): context and torch.npu.launch_model(..., stream_index=..., timestamp=...) calls directly. See test_scheduler.py for the updated usage pattern.

CI, tests, experiments

  • Added or tightened tests for DeepSeek, YOLOv5, MobileNet; CI image updated for PyTorch 2.8.

Other

  • Misc. codegen, indexing, and matmul-related bugfixes and small refactors.

New Contributors

Full Changelog: v1.0.1...v1.1.0

v1.0.0

04 Aug 08:39

Choose a tag to compare

[CI] Fix docker file and action credential issue