Sync with Microsoft ONNX Runtime - 07042026 by ai-fw-intg · Pull Request #1028 · intel/onnxruntime

ai-fw-intg · 2026-04-06T21:04:14Z

Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.

### Description Fixes ICM issue https://portal.microsofticm.com/imp/v5/incidents/details/31000000562663/summary ### Motivation and Context Fix ICMs

This pull request introduces a new documentation page, `PartitioningWithAnnotationsAndMemoryConstraints.md`, which explains advanced ONNX Runtime features for partitioning model graphs across devices with explicit control. The doc covers how to annotate model layers for device assignment, collect per-node memory statistics, and enforce GPU memory budgets during partitioning. These features enable precise control over device placement and memory usage for large models. The most important changes are: **New Documentation: Advanced Partitioning Features** * Adds a comprehensive guide (`PartitioningWithAnnotationsAndMemoryConstraints.md`) describing how to use ONNX Runtime’s layer annotation and memory constraint features for graph partitioning. **Layer Assignment via Annotations** * Explains how to annotate ONNX model nodes with `layer_ann` metadata, including manual annotation and automated annotation using Olive’s `CaptureLayerAnnotations` pass. * Provides configuration examples for mapping annotation patterns to devices at runtime using the `session.layer_assignment_settings` session option. **Capacity-Aware Partitioning** * Details a two-phase workflow for profiling per-node memory usage and then enforcing a memory budget with the `session.resource_cuda_partitioning_settings` session option. * Covers both profiling-based and ad-hoc (estimation-only) approaches for memory-constrained partitioning. ([docs/annotated_partitioning/PartitioningWithAnnotationsAndMemoryConstraints.mdR1-R267](diffhunk://#diff-10b3051b9e36eccfc7ca0f2d44ce78a9980ca573cde0f931ffd1456da2c681daR1-R267) This is a follow up for microsoft#27595

### Description Increase version number to 1.26.0. The rel-1.25.0 release branch has been cut. ### Changes - VERSION_NUMBER: 1.25.0 → 1.26.0 - ORT_API_VERSION: 25 → 26 (header + C API struct rename) - Python, JS, docs version strings updated via update_version.py - C# NativeTrainingMethods ORT_API_VERSION: 23 → 26 - samples/cxx/README.md example paths updated - docs/Versioning.md example updated ### Motivation and Context Per release process: bump main branch version immediately after cutting the release branch.

Proposal for CausalConvWithState and LinearAttention onnxruntime custom operator. This follows the proposal in onnx/onnx#7767.

…osoft#27901) Add ORT_ENFORCE checks in the SVMRegressor constructor to validate that coefficients, support_vectors, and rho attribute array sizes are consistent with the declared n_supports dimension. Without this validation, a crafted model with undersized arrays causes the GEMM inner loop to read past buffer boundaries. This mirrors the existing validation already present in SVMClassifier. - Validate rho is non-empty (accessed as rho_[0] in LINEAR mode, passed to GEMM as bias in SVC mode) - Validate coefficients.size() >= vector_count_ in SVC mode - Validate feature_count_ > 0 after support_vectors division - Add two unit tests for undersized coefficients and support_vectors --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

### Description  ### Motivation and Context

…alues (microsoft#27789) ### Description Fixes a heap out-of-bounds write (underflow) in the `Attention` contrib operator's `PrepareMask` function. Negative values in the 1D `mask_index` tensor were used directly as loop start indices without bounds checking, allowing writes at negative offsets before the `p_mask` buffer. In `PrepareMask()` (`attention_helper.h`), `end_position` is read from `mask_index[b_i]` and used as the starting index in a write loop with no lower-bound validation. When `end_position` is negative, the loop writes `mask_filter_value` at negative offsets — a heap buffer underflow. In contrast, `start_position` had partial clamping via `std::min()` but lacked a lower bound as well. ### Motivation and Context

…27752) for webgpu ep: + onnx rotary-embedding op + onnx rmsnorm + reshape-> opset-25 + transpose -> opset-24

### Description Extends the CUDA Transpose kernel registration from opset 23 to opset 25. - **`transpose.cc`**: Cap existing opset 23 kernel to versioned `(23, 24)`, add new non-versioned kernel at opset 25 - **`cuda_execution_provider.cc`**: Update forward declarations and `BuildKernelCreateInfo` entries to match; add new `// Opset 25` section - **`docs/OperatorKernels.md`**: Update CUDA Transpose entry from `23+` to `25+` with new `[23, 24]` versioned range No functional or type constraint changes — the kernel implementation is identical across these opsets. ### Motivation and Context CUDA EP's Transpose registration stopped at opset 23 while the ONNX spec defines it through opset 25. This is one of the P1 gaps tracked in microsoft#27729, following the same pattern as microsoft#27728. ### Limitation This PR does not add support of new data type for Transpose: - int2 (opset 25) - float8e8m0 (opset 24) - float4e2m1 (opset 23) - float8e4m3fn,float8e4m3fnuz, float8e5m2, float8e5m2fnuz, uint4, int4 (opset 21) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>

hariharans29 and others added 10 commits April 6, 2026 00:42

ICM fixes (4/n) (microsoft#27957)

c86a05a

### Description Fixes ICM issue https://portal.microsofticm.com/imp/v5/incidents/details/31000000562663/summary ### Motivation and Context Fix ICMs

linear attention signature (microsoft#27842)

e532c21

Proposal for CausalConvWithState and LinearAttention onnxruntime custom operator. This follows the proposal in onnx/onnx#7767.

Fix ADO pools for webgpu & other (microsoft#27988)

91c7a93

### Description  ### Motivation and Context

+rotemb, +rmsnorm, reshape->opset-25, transpose->opset-24 (microsoft#…

410f5a8

…27752) for webgpu ep: + onnx rotary-embedding op + onnx rmsnorm + reshape-> opset-25 + transpose -> opset-24

Merge remote-tracking branch 'origin/master' into sync_msft_07042026

fcde992

ai-fw-intg requested review from Jaswanth51, ankitm3k, jatinwadhwa921 and vthaniel April 6, 2026 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with Microsoft ONNX Runtime - 07042026#1028

Sync with Microsoft ONNX Runtime - 07042026#1028
ai-fw-intg wants to merge 10 commits intoovep-developfrom
sync_msft_07042026

ai-fw-intg commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

ai-fw-intg commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants