Skip to content

Fix HIP CannyEdgeDetector failure on repeated vxProcessGraph#1

Open
simonCatBot wants to merge 1 commit into
developfrom
fix/hip-canny-stack-counter-reset
Open

Fix HIP CannyEdgeDetector failure on repeated vxProcessGraph#1
simonCatBot wants to merge 1 commit into
developfrom
fix/hip-canny-stack-counter-reset

Conversation

@simonCatBot

Copy link
Copy Markdown
Owner

Summary

Fixes a HIP-specific bug where CannyEdgeDetector fails on the second (and subsequent) vxProcessGraph calls.

Problem

The HIP output synchronization for AGO_TYPE_CANNY_STACK reads the GPU atomic counter to obtain stackTop, but it never resets that counter back to zero. On the next graph execution the Canny suppression kernel sees the stale counter value, believes the stack already contains entries, and either:

  • writes past the stack buffer (Hip_CannySuppThreshold_U8XY_U16_3x3 uses an atomic increment on the same counter), or
  • produces hipErrorInvalidArgument / agoWaitForNodesCompletion: single node wait failed.

This makes CannyEdgeDetector unusable in any graph that is processed more than once.

Root cause

In amd_openvx/openvx/ago/ago_util_hip.cpp, agoGpuHipDataOutputAtomicSync() handles the Canny stack by doing:

hipMemcpyDtoH(&stack, data->hip_memory, sizeof(vx_uint32));
data->u.cannystack.stackTop = stack;
// read stack data if stackTop > 0
// ... but never reset the GPU counter

The OpenCL equivalent (agoGpuOclDataOutputAtomicSync()) already resets the counter to zero after reading via clEnqueueMapBuffer/clEnqueueUnmapMemObject. HIP was missing that reset.

Fix

After reading the Canny stack entries from GPU memory, reset the GPU counter to 0 with hipMemcpyHtoD. Also clamp stackTop to the declared stack capacity as a defensive guard against reading past the GPU buffer.

// reset the GPU counter to 0 so the next graph execution starts from a clean stack
// (matches the OpenCL path in agoGpuOclDataOutputAtomicSync)
if (data->hip_memory) {
    vx_uint32 zero = 0;
    hipError_t err = hipMemcpyHtoD(data->hip_memory, (void *)&zero, sizeof(vx_uint32));
    ...
}

Verification

Build configuration:

  • Host: avinya (AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S)
  • ROCm/HIP: 7.13
  • CMake: -DBACKEND=HIP -DGPU_SUPPORT=ON -DNEURAL_NET=OFF -DLOOM=OFF -DMIGRAPHX=OFF

Tests performed:

  1. Minimal Canny reproducer (20 iterations of vxProcessGraph) — passes after fix, failed on iteration 2 before.
  2. Khronos OpenVX CTS 1.3 Canny filter (vxuCanny + vxCanny) — 56/56 pass.
  3. Khronos OpenVX CTS 1.3 HarrisCorners filter — 433/433 pass (previously contaminated by the Canny failure).
  4. Full OpenVX CTS 1.3 suite — 5820/5820 pass with correct LD_LIBRARY_PATH.
  5. openvx-mark --vision-parity FHD — 41/41 pass, 0 skipped, 0 failed.

Performance:

  • CannyEdgeDetector at FHD: ~2.0 ms (was failing/skipped before).
  • No measurable regression in overall vision score compared to the same build without the fix.

Linked issue

Fixes ROCm#1693

Notes for reviewers

  • The change is localized to the HIP path only; CPU and OpenCL behavior are unchanged.
  • The clamp is conservative and only protects against malformed/overflow cases; in practice the reset is the functional fix.

The HIP Canny stack output synchronization read the GPU atomic counter
but never reset it to zero. On the second vxProcessGraph call the kernel
saw the previous stackTop, wrote past the buffer, and produced
'invalid argument' / 'single node wait failed' errors. The OpenCL path
already resets the counter after reading via map/unmap; this change
adds the equivalent hipMemcpyHtoD reset.

Also clamp stackTop to the buffer capacity as a defensive guard.

Fixes: ROCm#1693

Signed-off-by: Kiriti Gowda <kiriti.gowda@gmail.com>
@simonCatBot simonCatBot force-pushed the fix/hip-canny-stack-counter-reset branch from 28074e0 to 6eb3081 Compare June 20, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HIP CannyEdgeDetector fails on repeated vxProcessGraph: Canny stack GPU counter not reset

1 participant