feat(torch): expose optional codegen parameters by voltjia · Pull Request #619 · InfiniTensor/InfiniOps

voltjia · 2026-05-20T08:55:15Z

Summary

Expose supported ATen optional parameters as stable InfiniOps C++ parameters in generated PyTorch operator bases.
Bind generated PyTorch backends to existing src/base/<op>.h overloads when available, forwarding omitted optional/default ATen parameters as typed defaults.
Add std::optional<T> support to operator cache hashing and update the generated torch-op test harness for optional arguments and known vendor-specific PyTorch crashes/divergences.
Add generator tests for optional parameter exposure and existing-base overload binding.

Motivation

The PyTorch code generator previously hid optional ATen schema parameters and always forwarded typed nullopt values. That made generated APIs unable to exercise non-default optional behavior and caused drift against operator base headers that intentionally expose optional parameters. This PR makes optional schema handling explicit while keeping existing hand-written bases as the public API source of truth when they are present.

Closes # N/A — this is follow-up work from the PyTorch codegen/base drift discussion.

Type of Change

feat — new feature / new operator / new platform
N/A — fix — bug fix.
N/A — perf — performance improvement (no behavioral change).
N/A — refactor — code restructuring without behavior change.
N/A — test — adding or fixing tests only.
N/A — docs — documentation only.
N/A — build / ci — build system or CI configuration.
N/A — chore — tooling, formatting, or other non-code changes.
N/A — Breaking change.

Platforms Affected

CPU (WITH_CPU)
NVIDIA (WITH_NVIDIA)
Iluvatar (WITH_ILUVATAR)
MetaX (WITH_METAX)
Cambricon (WITH_CAMBRICON)
Moore (WITH_MOORE)
Ascend (WITH_ASCEND)
PyTorch C++ bindings (WITH_TORCH)
N/A — Build system / CMake / CI; no CMake or CI files are changed.
Python bindings / user-facing API

Test Results on Supported Platforms

All runs generated PyTorch operator sources before build and installed with WITH_TORCH=ON. Full pytest was run with verbose test names enabled.

Platform	Built	`pytest` Result	Build Time	Test Time	Total Time	Notes / Hardware
NVIDIA	Yes	`9273 passed, 8600 skipped`	947s	432s	1384s	PyTorch backend compiled and `tests/test_torch_ops.py` ran.
Iluvatar	Yes	`7771 passed, 8584 skipped`	766s	589s	1355s	PyTorch backend compiled and `tests/test_torch_ops.py` ran.
MetaX	Yes	`8765 passed, 7590 skipped`	1344s	449s	1794s	PyTorch backend compiled and `tests/test_torch_ops.py` ran.
Cambricon	Yes	`5968 passed, 10003 skipped`	2438s	1023s	3461s	PyTorch backend compiled and `tests/test_torch_ops.py` ran.
Moore	Yes	`8531 passed, 7842 skipped`	2174s	607s	2781s	PyTorch backend compiled and `tests/test_torch_ops.py` ran; `native_batch_norm` is skipped for a `torch_musa` `_out`/functional divergence.
Ascend	Yes	`7447 passed, 8866 skipped`	1089s	691s	1826s	Pytest summary passed; the container exited with code 137 after `pytest_end`, matching the known post-test Ascend container behavior.

Validation details

python scripts/generate_torch_ops.py
generated 625 overloads across 507 ops

python -m ruff format --check scripts/generate_torch_ops.py scripts/generate_wrappers.py tests/test_generate_torch_ops.py tests/test_torch_ops.py
4 files already formatted

python -m ruff check scripts/generate_torch_ops.py scripts/generate_wrappers.py tests/test_generate_torch_ops.py tests/test_torch_ops.py
All checks passed!

clang-format --dry-run --Werror src/hash.h
passed

Benchmark / Performance Impact

N/A — this PR changes generated API/backend plumbing and tests. The table above records build and test wall time for each platform to support follow-up compile-time optimization work.

Notes for Reviewers

Existing src/base/<op>.h overloads are treated as the public API when present. The generator binds compatible overloads to ATen schema parameters and fills omitted optional/default schema parameters at the ATen call site.
Generated fresh bases now expose supported optional types as std::optional<...>. PyTorch-internal optional types without stable InfiniOps representations remain hidden and are forwarded as typed empty optionals.
A full codegen pass currently reports 625 overloads across 507 ops. The generated metadata exposes optional parameters across existing and generated bases.
The test harness skips only known vendor-kernel crashes/divergences that otherwise terminate the Python process or compare mismatched vendor paths; PyTorch-backed tests are still collected and executed on every platform.

Checklist

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
Each commit message follows Conventional Commits.
Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
No stray merge commits from master — the branch is rebased cleanly on top of the current master.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
Public API changes are intentional, documented in this PR, and reflected in affected callers/tests.

General Code Hygiene

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
All comments and error messages are in English (CONTRIBUTING.md §Code/General).
Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific

Code follows the Google C++ Style Guide strictly.
clang-format --dry-run --Werror src/hash.h passes.
N/A — clang-tidy was not run; no kernel or algorithm implementation path is added.
Operator parameter order is inputs first, outputs last; attributes are between inputs and outputs; naming follows PyTorch → ONNX → CUDA API precedence (CONTRIBUTING.md §C++).
No exceptions are thrown. No new C++ error path was added.
N/A — No new C++ error or warning message was added.
N/A — No kernel files are added or renamed.
N/A — No kernel launcher files are added or changed.
Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
Exactly one blank line between members within a class (CONTRIBUTING.md §C++).
Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).
N/A — No new hand-written operator implementation is added under src/base/<op>.h or platform implementation directories.
No raw new/delete; RAII / smart pointers / existing allocators are used.

Python Specific

Code is PEP 8 compliant; ruff check passes cleanly.
ruff format --check passes cleanly.
Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
Framework-specific conventions are honored where applicable (CONTRIBUTING.md §Python).
No blank line between the function signature and the body when there is no docstring or comment (CONTRIBUTING.md §Python).
A blank line is present before and after if, for, and similar control-flow statements (CONTRIBUTING.md §Python).
A blank line appears before each return, except when it directly follows a control-flow statement like if or for (CONTRIBUTING.md §Python).
Docstrings follow PEP 257 conventions.
Type hints are added / kept consistent with the surrounding code.

Testing

Full-platform pytest was run on all supported platforms with WITH_TORCH=ON.
N/A — No platform was unreachable.
New functionality has matching tests under tests/.
Tests use pytest.mark.parametrize correctly.
N/A — pytest.mark.auto_act_and_assert is not used by the generator unit tests or generated torch-op harness touched here.
Default dtype / device parameterization is relied on, or overridden with an explicit pytest.mark.parametrize when necessary.
Known vendor-kernel crashes/divergences are skipped explicitly to keep the full run progressing.
N/A — This is a feature PR rather than a bug-fix regression test PR.

Build, CI, and Tooling

The project builds cleanly from a fresh directory with pip install .[dev] on affected platforms.
compile_commands.json still regenerates through the existing CMake/scikit-build configuration path.
N/A — No new backend or device auto-detection is added.
Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not changed.
ruff and clang-format checks are green.
No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies].

Documentation

N/A — No README, CONTRIBUTING, build flag, or developer workflow change is introduced.
N/A — No new operator, dispatch helper, or public utility is added outside generated code behavior.
N/A — No user-visible breaking change is intentionally introduced.

Security and Safety

No secrets, access tokens, internal URLs, customer data, IP addresses, or personal hardware identifiers have been committed or included in this PR description.
N/A — No third-party code is added.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

voltjia force-pushed the feat/torch-codegen-optional-overloads branch 4 times, most recently from 341f047 to 5e043a8 Compare May 20, 2026 12:58

feat(torch): expose optional codegen parameters

d9714e7

voltjia force-pushed the feat/torch-codegen-optional-overloads branch from 5e043a8 to d9714e7 Compare May 20, 2026 13:33

voltjia marked this pull request as ready for review May 20, 2026 14:13

voltjia requested a review from a team May 20, 2026 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(torch): expose optional codegen parameters#619

feat(torch): expose optional codegen parameters#619
voltjia wants to merge 1 commit into
masterfrom
feat/torch-codegen-optional-overloads

voltjia commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voltjia commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene

C++ Specific

Python Specific

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

voltjia commented May 20, 2026 •

edited

Loading