feat(dynamo): add target_executorch setting to keep output-allocator ops in PyTorch by shoumikhin · Pull Request #4355 · pytorch/TensorRT

shoumikhin · 2026-06-21T00:28:32Z

Description

Some converters require a TensorRT output allocator because their output shape is
data-dependent (for example aten.nonzero). A TensorRT engine that needs an output
allocator cannot be consumed by every downstream runtime that executes the compiled
program.

This adds a target_executorch compile setting (default False). When enabled, every
operator whose converter sets requires_output_allocator is routed to
torch_executed_ops and runs in PyTorch instead of being lowered into a TensorRT
engine. When disabled (the default), behavior is unchanged.

Details

Discovery and routing are factored into two small helpers (_output_allocator_ops
and _route_output_allocator_ops) so both are unit-testable without a GPU. The
registry walk handles a single converter, a list/tuple, or a priority-keyed dict,
and is conservative: if any converter for a target needs an allocator, the whole
target is routed to PyTorch so an allocator engine is never emitted.
Wired through compile() and cross_compile_for_windows(); the routing runs in
compile_module(), which both entry points funnel through. It is intentionally not
exposed on convert_exported_program_to_serialized_trt_engine(), where a single
serialized engine cannot contain PyTorch fallbacks.
Combining target_executorch with require_full_compilation raises a clear error,
since routing ops to PyTorch contradicts full compilation.
CompilationSettings.__setstate__ defaults the new field so older pickles load.

The name is deliberate: it gates ExecuTorch-targeted routing, and further
ExecuTorch-specific behavior can accrete under the same flag.

Tests

tests/py/dynamo/models/test_target_executorch.py:

the setting defaults to False and is settable;
a state missing the field (older pickle) restores to False;
output-allocator converters are discoverable via requires_output_allocator;
routing is a no-op when the flag is off;
routing adds the op to torch_executed_ops when on (CPU only, no GPU needed);
combining with require_full_compilation raises;
end to end on GPU, a data-dependent op (nonzero) falls back to PyTorch.

Type of change

New feature (non-breaking change which adds functionality)

Checklist

My code follows the style guidelines of this project (isort + black)
I have added tests that prove my fix/feature works
Commit is signed off (DCO)

…ops in PyTorch Some converters require a TensorRT output allocator because their output shape is data-dependent (for example aten.nonzero). A TensorRT engine that needs an output allocator cannot be consumed by every downstream runtime that executes the compiled program. This adds a target_executorch compile setting (default False). When enabled, every operator whose converter sets requires_output_allocator is routed to torch_executed_ops and runs in PyTorch instead of being lowered into a TensorRT engine. When disabled (the default), behavior is unchanged. Details: - Discovery and routing live in two small helpers (_output_allocator_ops and _route_output_allocator_ops) so both are unit-testable without a GPU. The registry walk handles a single converter, a list/tuple, or a priority-keyed dict, and is conservative: if any converter for a target needs an allocator, the whole target is routed to PyTorch so an allocator engine is never emitted. - Wired through compile() and cross_compile_for_windows(); the routing runs in compile_module(), which both entry points funnel through. It is intentionally not exposed on convert_exported_program_to_serialized_trt_engine(), where a single serialized engine cannot contain PyTorch fallbacks. - Combining target_executorch with require_full_compilation raises a clear error, since routing ops to PyTorch contradicts full compilation. - CompilationSettings.__setstate__ defaults the new field so older pickles load. The name is deliberate: it gates ExecuTorch-targeted routing, and further ExecuTorch-specific behavior can accrete under the same flag. Tests (tests/py/dynamo/models/test_target_executorch.py): default value; old-pickle compatibility; output-allocator op discovery; routing is a no-op when disabled; routing adds the op when enabled (CPU only); the require_full_compilation conflict; and an end to end GPU test that a data-dependent op falls back to PyTorch. Signed-off-by: shoumikhin <shoumikhin@meta.com>

meta-cla Bot added the cla signed label Jun 21, 2026

github-actions Bot added component: tests Issues re: Tests component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jun 21, 2026

github-actions Bot requested a review from cehongwang June 21, 2026 00:28

shoumikhin force-pushed the target-executorch-setting branch from fb85c0d to 424b09f Compare June 21, 2026 10:28

shoumikhin force-pushed the target-executorch-setting branch from 424b09f to 85fa1eb Compare June 22, 2026 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dynamo): add target_executorch setting to keep output-allocator ops in PyTorch#4355

feat(dynamo): add target_executorch setting to keep output-allocator ops in PyTorch#4355
shoumikhin wants to merge 1 commit into
pytorch:mainfrom
shoumikhin:target-executorch-setting

shoumikhin commented Jun 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shoumikhin commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Details

Tests

Type of change

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shoumikhin commented Jun 21, 2026 •

edited

Loading