Skip to content

[UR] Fix buffer allocation logic#21725

Open
kweronsx wants to merge 7 commits intosyclfrom
test/fix-cuda-use-host-pointer-buff
Open

[UR] Fix buffer allocation logic#21725
kweronsx wants to merge 7 commits intosyclfrom
test/fix-cuda-use-host-pointer-buff

Conversation

@kweronsx
Copy link
Copy Markdown
Contributor

@kweronsx kweronsx commented Apr 10, 2026

This PR fixes a long-standing issue (#9789) where UR_MEM_FLAG_USE_HOST_POINTER was silently ignored in the CUDA and HIP adapters, causing the host data to be copied to the device instead of being mapped.

What changed:

Removed the dead EnableUseHostPtr = false flag in both cuda/memory.cpp and hip/memory.cpp. This flag was hardcoded to false with a TODO comment, meaning the USE_HOST_PTR path was never actually taken — the buffer always fell back to copying.

Enabled the UseHostPtr allocation mode by properly entering that branch when UR_MEM_FLAG_USE_HOST_POINTER is set. The hipHostRegister / cuMemHostRegister call was moved from urMemBufferCreate into allocateMemObjOnDeviceIfNeeded (i.e. it is deferred to lazy device allocation), which is where it already lived.

Added HostPtrRegisteredByUR ownership flag to BufferMem in both headers. This boolean is set to true only when UR itself performed the registration. It ensures cuMemHostUnregister / hipHostUnregister is only called by the owner, preventing a double-unregister.

Handled already-registered pointers gracefully: CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED / hipErrorHostMemoryAlreadyRegistered is now tolerated in allocateMemObjOnDeviceIfNeeded. This covers cases where the same host pointer is registered across multiple lazy device allocations (e.g. multi-device contexts).

Explicit copy constructors were added to BufferMem in both adapters to deliberately not copy HostPtrRegisteredByUR, making ownership semantics clear — only the original object is responsible for unregistering.

Conformance test update: CUDA and HIP are removed from the UseHostPointer known-failure list in urMemBufferCreate.cpp, reflecting that the feature now works correctly on those backends.

@kweronsx kweronsx force-pushed the test/fix-cuda-use-host-pointer-buff branch 3 times, most recently from 8a2d0e3 to 6d27f2a Compare April 14, 2026 14:05
…tration

Previously, UR_MEM_FLAG_USE_HOST_POINTER was effectively ignored in both
adapters due to a dead `EnableUseHostPtr = false` flag, causing the host
data to be copied to the device instead of being mapped. This change
removes that dead flag and properly enables the USE_HOST_PTR path, calling
cuMemHostRegister / hipHostRegister to map the caller-provided host pointer
for device access.

In allocateMemObjOnDeviceIfNeeded, CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED
/ hipErrorHostMemoryAlreadyRegistered is now tolerated to handle the case
where the same host pointer is registered more than once across multiple
lazy device allocations.

The conformance test urMemBufferCreate.UseHostPointer is updated to remove
CUDA and HIP from the known-failure list.

Resolves: #9789
@kweronsx kweronsx marked this pull request as ready for review April 15, 2026 09:12
@kweronsx kweronsx requested review from a team as code owners April 15, 2026 09:12
@kweronsx kweronsx requested a review from kekaczma April 15, 2026 09:12
@kweronsx
Copy link
Copy Markdown
Contributor Author

This PR started from an attempt to get the UseHostPointer conformance test in urMemBufferCreate to pass for CUDA and HIP.

UR_MEM_FLAG_USE_HOST_POINTER had been silently treated as a copy operation in both adapters, hidden behind a hardcoded EnableUseHostPtr = false guard. The guard was accompanied by a comment citing a "weird segfault after program ends," but no such segfault was observed, so the dead code has been removed.

Enabling the feature revealed one real issue: cuMemHostRegister/hipHostRegister is called once during buffer creation in urMemBufferCreate, and then again in allocateMemObjOnDeviceIfNeeded at kernel submission time, resulting in a double-registration. This is handled by tolerating CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED/hipErrorHostMemoryAlreadyRegistered in allocateMemObjOnDeviceIfNeeded instead of propagating it as an error.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Unified Runtime buffer creation behavior to better support UR_MEM_FLAG_USE_HOST_POINTER by using host-memory registration (CUDA/HIP) rather than falling back to an initial copy, and adjusts conformance expectations accordingly.

Changes:

  • Enable UR_MEM_FLAG_USE_HOST_POINTER paths for HIP and CUDA adapters (removing the previous “disabled” gating logic).
  • Refine “initial copy” logic so it only triggers for UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER.
  • Update conformance test expectations by removing HIP/CUDA from the known-failing UseHostPointer test list.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
unified-runtime/test/conformance/memory/urMemBufferCreate.cpp Removes HIP/CUDA from known failures for UseHostPointer conformance test.
unified-runtime/source/adapters/hip/memory.cpp Switches USE_HOST_POINTER to UseHostPtr alloc mode and adjusts registration behavior/error handling.
unified-runtime/source/adapters/cuda/memory.cpp Switches USE_HOST_POINTER to UseHostPtr alloc mode and adjusts registration error handling.
Comments suppressed due to low confidence (1)

unified-runtime/source/adapters/hip/memory.cpp:107

  • UR_MEM_FLAG_USE_HOST_POINTER now sets AllocMode::UseHostPtr, but urMemBufferCreate no longer registers the host memory. Since BufferMem::clear() unconditionally calls hipHostUnregister(HostPtr) for UseHostPtr, releasing a buffer that was never used on a device will attempt to unregister an unregistered pointer and can fail/crash. Register the host pointer in urMemBufferCreate when USE_HOST_POINTER is set, or track whether registration actually happened and only unregister in that case.
    auto HostPtr = pProperties ? pProperties->pHost : nullptr;
    BufferMem::AllocMode AllocMode = BufferMem::AllocMode::Classic;
    if (flags & UR_MEM_FLAG_USE_HOST_POINTER) {
      AllocMode = BufferMem::AllocMode::UseHostPtr;
    } else if (flags & UR_MEM_FLAG_ALLOC_HOST_POINTER) {
      UR_CHECK_ERROR(hipHostMalloc(&HostPtr, size));
      AllocMode = BufferMem::AllocMode::AllocHostPtr;
    } else if (flags & UR_MEM_FLAG_ALLOC_COPY_HOST_POINTER) {
      AllocMode = BufferMem::AllocMode::CopyIn;
    }

Comment thread unified-runtime/source/adapters/hip/memory.cpp
Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/cuda/memory.cpp
Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
@kweronsx kweronsx marked this pull request as draft April 15, 2026 12:25
@kweronsx kweronsx requested a review from ldorau April 15, 2026 13:39
@kweronsx kweronsx marked this pull request as ready for review April 15, 2026 13:39
Fix CU_MEMHOSTALLOC_DEVICEMAP → CU_MEMHOSTREGISTER_DEVICEMAP in the
CUDA adapter's allocateMemObjOnDeviceIfNeeded (wrong flag for
cuMemHostRegister).

Add HostPtrRegisteredByUR flag to HIP BufferMem to track whether UR
performed the hipHostRegister call. clear() now only calls
hipHostUnregister when UR owns the registration, preventing incorrect
unregistration of user-provided already-registered memory.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/cuda/memory.cpp
Comment thread unified-runtime/source/adapters/hip/memory.hpp
Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
@kweronsx kweronsx marked this pull request as draft April 16, 2026 06:06
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 07:12
@kweronsx kweronsx requested a review from ldorau April 16, 2026 07:12
Remove eager cuMemHostRegister from urMemBufferCreate; registration is
now lazy-only in allocateMemObjOnDeviceIfNeeded, matching the HIP
approach.
Copy link
Copy Markdown
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues left:

Comment thread unified-runtime/source/adapters/cuda/memory.cpp Outdated
Comment thread unified-runtime/source/adapters/cuda/memory.hpp Outdated
@kweronsx kweronsx marked this pull request as draft April 16, 2026 08:18
Replace the defaulted BufferMem copy constructor with an explicit one
that does not copy HostPtrRegisteredByUR, so sub-buffers do not inherit
registration ownership from their parent.
@kweronsx kweronsx requested a review from ldorau April 16, 2026 09:18
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 09:19
Comment thread unified-runtime/source/adapters/hip/memory.hpp
@kweronsx kweronsx marked this pull request as draft April 16, 2026 09:34
Add explicit copy constructor to HIP BufferMem that does not copy
HostPtrRegisteredByUR, matching the fix already applied to the CUDA
adapter. Prevents sub-buffers from inheriting registration ownership
from their parent.
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 10:12
@kweronsx kweronsx requested a review from ldorau April 16, 2026 10:12
@ldorau ldorau requested a review from Copilot April 16, 2026 10:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comment thread unified-runtime/source/adapters/hip/memory.hpp
@kweronsx kweronsx marked this pull request as draft April 16, 2026 10:29
@kweronsx kweronsx marked this pull request as ready for review April 16, 2026 10:35
@kweronsx kweronsx requested a review from ldorau April 16, 2026 10:35
@ldorau ldorau requested a review from Copilot April 16, 2026 10:38
Copy link
Copy Markdown
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@ldorau
Copy link
Copy Markdown
Contributor

ldorau commented Apr 16, 2026

Review please @intel/llvm-reviewers-cuda

@ldorau
Copy link
Copy Markdown
Contributor

ldorau commented Apr 16, 2026

@kweronsx please update the description of this PR (it is empty now) #21725 (comment), because it will be used as the description of the final commit squashed from all commits from this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants