Skip to content

feat(torch): expose optional codegen parameters#619

Open
voltjia wants to merge 1 commit into
masterfrom
feat/torch-codegen-optional-overloads
Open

feat(torch): expose optional codegen parameters#619
voltjia wants to merge 1 commit into
masterfrom
feat/torch-codegen-optional-overloads

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 20, 2026

Summary

  • Expose supported ATen optional parameters as stable InfiniOps C++ parameters in generated PyTorch operator bases.
  • Bind generated PyTorch backends to existing src/base/<op>.h overloads when available, forwarding omitted optional/default ATen parameters as typed defaults.
  • Add std::optional<T> support to operator cache hashing and update the generated torch-op test harness for optional arguments and known vendor-specific PyTorch crashes/divergences.
  • Add generator tests for optional parameter exposure and existing-base overload binding.

Motivation

The PyTorch code generator previously hid optional ATen schema parameters and always forwarded typed nullopt values. That made generated APIs unable to exercise non-default optional behavior and caused drift against operator base headers that intentionally expose optional parameters. This PR makes optional schema handling explicit while keeping existing hand-written bases as the public API source of truth when they are present.

Closes # N/A — this is follow-up work from the PyTorch codegen/base drift discussion.

Type of Change

  • feat — new feature / new operator / new platform
  • N/A — fix — bug fix.
  • N/A — perf — performance improvement (no behavioral change).
  • N/A — refactor — code restructuring without behavior change.
  • N/A — test — adding or fixing tests only.
  • N/A — docs — documentation only.
  • N/A — build / ci — build system or CI configuration.
  • N/A — chore — tooling, formatting, or other non-code changes.
  • N/A — Breaking change.

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • N/A — Build system / CMake / CI; no CMake or CI files are changed.
  • Python bindings / user-facing API

Test Results on Supported Platforms

All runs generated PyTorch operator sources before build and installed with WITH_TORCH=ON. Full pytest was run with verbose test names enabled.

Platform Built pytest Result Build Time Test Time Total Time Notes / Hardware
NVIDIA Yes 9273 passed, 8600 skipped 947s 432s 1384s PyTorch backend compiled and tests/test_torch_ops.py ran.
Iluvatar Yes 7771 passed, 8584 skipped 766s 589s 1355s PyTorch backend compiled and tests/test_torch_ops.py ran.
MetaX Yes 8765 passed, 7590 skipped 1344s 449s 1794s PyTorch backend compiled and tests/test_torch_ops.py ran.
Cambricon Yes 5968 passed, 10003 skipped 2438s 1023s 3461s PyTorch backend compiled and tests/test_torch_ops.py ran.
Moore Yes 8531 passed, 7842 skipped 2174s 607s 2781s PyTorch backend compiled and tests/test_torch_ops.py ran; native_batch_norm is skipped for a torch_musa _out/functional divergence.
Ascend Yes 7447 passed, 8866 skipped 1089s 691s 1826s Pytest summary passed; the container exited with code 137 after pytest_end, matching the known post-test Ascend container behavior.
Validation details
python scripts/generate_torch_ops.py
generated 625 overloads across 507 ops
python -m ruff format --check scripts/generate_torch_ops.py scripts/generate_wrappers.py tests/test_generate_torch_ops.py tests/test_torch_ops.py
4 files already formatted

python -m ruff check scripts/generate_torch_ops.py scripts/generate_wrappers.py tests/test_generate_torch_ops.py tests/test_torch_ops.py
All checks passed!
clang-format --dry-run --Werror src/hash.h
passed

Benchmark / Performance Impact

N/A — this PR changes generated API/backend plumbing and tests. The table above records build and test wall time for each platform to support follow-up compile-time optimization work.

Notes for Reviewers

  • Existing src/base/<op>.h overloads are treated as the public API when present. The generator binds compatible overloads to ATen schema parameters and fills omitted optional/default schema parameters at the ATen call site.
  • Generated fresh bases now expose supported optional types as std::optional<...>. PyTorch-internal optional types without stable InfiniOps representations remain hidden and are forwarded as typed empty optionals.
  • A full codegen pass currently reports 625 overloads across 507 ops. The generated metadata exposes optional parameters across existing and generated bases.
  • The test harness skips only known vendor-kernel crashes/divergences that otherwise terminate the Python process or compare mismatched vendor paths; PyTorch-backed tests are still collected and executed on every platform.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes are intentional, documented in this PR, and reflected in affected callers/tests.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific

  • Code follows the Google C++ Style Guide strictly.
  • clang-format --dry-run --Werror src/hash.h passes.
  • N/A — clang-tidy was not run; no kernel or algorithm implementation path is added.
  • Operator parameter order is inputs first, outputs last; attributes are between inputs and outputs; naming follows PyTorch → ONNX → CUDA API precedence (CONTRIBUTING.md §C++).
  • No exceptions are thrown. No new C++ error path was added.
  • N/A — No new C++ error or warning message was added.
  • N/A — No kernel files are added or renamed.
  • N/A — No kernel launcher files are added or changed.
  • Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
  • Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
  • Exactly one blank line between members within a class (CONTRIBUTING.md §C++).
  • Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).
  • N/A — No new hand-written operator implementation is added under src/base/<op>.h or platform implementation directories.
  • No raw new/delete; RAII / smart pointers / existing allocators are used.

Python Specific

  • Code is PEP 8 compliant; ruff check passes cleanly.
  • ruff format --check passes cleanly.
  • Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
  • Framework-specific conventions are honored where applicable (CONTRIBUTING.md §Python).
  • No blank line between the function signature and the body when there is no docstring or comment (CONTRIBUTING.md §Python).
  • A blank line is present before and after if, for, and similar control-flow statements (CONTRIBUTING.md §Python).
  • A blank line appears before each return, except when it directly follows a control-flow statement like if or for (CONTRIBUTING.md §Python).
  • Docstrings follow PEP 257 conventions.
  • Type hints are added / kept consistent with the surrounding code.

Testing

  • Full-platform pytest was run on all supported platforms with WITH_TORCH=ON.
  • N/A — No platform was unreachable.
  • New functionality has matching tests under tests/.
  • Tests use pytest.mark.parametrize correctly.
  • N/A — pytest.mark.auto_act_and_assert is not used by the generator unit tests or generated torch-op harness touched here.
  • Default dtype / device parameterization is relied on, or overridden with an explicit pytest.mark.parametrize when necessary.
  • Known vendor-kernel crashes/divergences are skipped explicitly to keep the full run progressing.
  • N/A — This is a feature PR rather than a bug-fix regression test PR.

Build, CI, and Tooling

  • The project builds cleanly from a fresh directory with pip install .[dev] on affected platforms.
  • compile_commands.json still regenerates through the existing CMake/scikit-build configuration path.
  • N/A — No new backend or device auto-detection is added.
  • Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not changed.
  • ruff and clang-format checks are green.
  • No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies].

Documentation

  • N/A — No README, CONTRIBUTING, build flag, or developer workflow change is introduced.
  • N/A — No new operator, dispatch helper, or public utility is added outside generated code behavior.
  • N/A — No user-visible breaking change is intentionally introduced.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, IP addresses, or personal hardware identifiers have been committed or included in this PR description.
  • N/A — No third-party code is added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch 4 times, most recently from 341f047 to 5e043a8 Compare May 20, 2026 12:58
@voltjia voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 5e043a8 to d9714e7 Compare May 20, 2026 13:33
@voltjia voltjia marked this pull request as ready for review May 20, 2026 14:13
@voltjia voltjia requested a review from a team May 20, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant