Sync with Microsoft ONNX Runtime - 07042026#1028
Open
ai-fw-intg wants to merge 10 commits intoovep-developfrom
Open
Sync with Microsoft ONNX Runtime - 07042026#1028ai-fw-intg wants to merge 10 commits intoovep-developfrom
ai-fw-intg wants to merge 10 commits intoovep-developfrom
Conversation
### Description Fixes ICM issue https://portal.microsofticm.com/imp/v5/incidents/details/31000000562663/summary ### Motivation and Context Fix ICMs
This pull request introduces a new documentation page, `PartitioningWithAnnotationsAndMemoryConstraints.md`, which explains advanced ONNX Runtime features for partitioning model graphs across devices with explicit control. The doc covers how to annotate model layers for device assignment, collect per-node memory statistics, and enforce GPU memory budgets during partitioning. These features enable precise control over device placement and memory usage for large models. The most important changes are: **New Documentation: Advanced Partitioning Features** * Adds a comprehensive guide (`PartitioningWithAnnotationsAndMemoryConstraints.md`) describing how to use ONNX Runtime’s layer annotation and memory constraint features for graph partitioning. **Layer Assignment via Annotations** * Explains how to annotate ONNX model nodes with `layer_ann` metadata, including manual annotation and automated annotation using Olive’s `CaptureLayerAnnotations` pass. * Provides configuration examples for mapping annotation patterns to devices at runtime using the `session.layer_assignment_settings` session option. **Capacity-Aware Partitioning** * Details a two-phase workflow for profiling per-node memory usage and then enforcing a memory budget with the `session.resource_cuda_partitioning_settings` session option. * Covers both profiling-based and ad-hoc (estimation-only) approaches for memory-constrained partitioning. ([docs/annotated_partitioning/PartitioningWithAnnotationsAndMemoryConstraints.mdR1-R267](diffhunk://#diff-10b3051b9e36eccfc7ca0f2d44ce78a9980ca573cde0f931ffd1456da2c681daR1-R267) This is a follow up for microsoft#27595
### Description Increase version number to 1.26.0. The rel-1.25.0 release branch has been cut. ### Changes - VERSION_NUMBER: 1.25.0 → 1.26.0 - ORT_API_VERSION: 25 → 26 (header + C API struct rename) - Python, JS, docs version strings updated via update_version.py - C# NativeTrainingMethods ORT_API_VERSION: 23 → 26 - samples/cxx/README.md example paths updated - docs/Versioning.md example updated ### Motivation and Context Per release process: bump main branch version immediately after cutting the release branch.
Proposal for CausalConvWithState and LinearAttention onnxruntime custom operator. This follows the proposal in onnx/onnx#7767.
…osoft#27901) Add ORT_ENFORCE checks in the SVMRegressor constructor to validate that coefficients, support_vectors, and rho attribute array sizes are consistent with the declared n_supports dimension. Without this validation, a crafted model with undersized arrays causes the GEMM inner loop to read past buffer boundaries. This mirrors the existing validation already present in SVMClassifier. - Validate rho is non-empty (accessed as rho_[0] in LINEAR mode, passed to GEMM as bias in SVC mode) - Validate coefficients.size() >= vector_count_ in SVC mode - Validate feature_count_ > 0 after support_vectors division - Add two unit tests for undersized coefficients and support_vectors --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…alues (microsoft#27789) ### Description Fixes a heap out-of-bounds write (underflow) in the `Attention` contrib operator's `PrepareMask` function. Negative values in the 1D `mask_index` tensor were used directly as loop start indices without bounds checking, allowing writes at negative offsets before the `p_mask` buffer. In `PrepareMask()` (`attention_helper.h`), `end_position` is read from `mask_index[b_i]` and used as the starting index in a write loop with no lower-bound validation. When `end_position` is negative, the loop writes `mask_filter_value` at negative offsets — a heap buffer underflow. In contrast, `start_position` had partial clamping via `std::min()` but lacked a lower bound as well. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
…27752) for webgpu ep: + onnx rotary-embedding op + onnx rmsnorm + reshape-> opset-25 + transpose -> opset-24
### Description Extends the CUDA Transpose kernel registration from opset 23 to opset 25. - **`transpose.cc`**: Cap existing opset 23 kernel to versioned `(23, 24)`, add new non-versioned kernel at opset 25 - **`cuda_execution_provider.cc`**: Update forward declarations and `BuildKernelCreateInfo` entries to match; add new `// Opset 25` section - **`docs/OperatorKernels.md`**: Update CUDA Transpose entry from `23+` to `25+` with new `[23, 24]` versioned range No functional or type constraint changes — the kernel implementation is identical across these opsets. ### Motivation and Context CUDA EP's Transpose registration stopped at opset 23 while the ONNX spec defines it through opset 25. This is one of the P1 gaps tracked in microsoft#27729, following the same pattern as microsoft#27728. ### Limitation This PR does not add support of new data type for Transpose: - int2 (opset 25) - float8e8m0 (opset 24) - float4e2m1 (opset 23) - float8e4m3fn,float8e4m3fnuz, float8e5m2, float8e5m2fnuz, uint4, int4 (opset 21) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated daily backmerge from ORT main to ovep-develop. No conflicts detected. Do NOT squash or rebase - use merge commit only.